About Avanai
Avanai helps enterprises reimagine work with AI agents and automation that deliver measurable results, not surprises. We design, build, and run enterprise-grade AI agents and automated workflows that accelerate decision-making, cut costs, ensure compliance, and unlock ROI in the first year.
The Role
We're looking for a Senior DevOps Engineer with deep expertise in multi-cloud environments (AWS, Azure, and GCP) as well as bare metal infrastructure to architect, build, and maintain our infrastructure. You'll play a critical role in ensuring our AI-powered platforms are scalable, secure, and highly available-whether deployed in the cloud or on client-owned on-premises servers. This is a hands-on role where you'll design infrastructure as code for hybrid environments, build robust CI/CD pipelines, and implement best practices for both cloud-native and bare metal deployments.
What You'll Do
- Design, implement, and manage cloud infrastructure across AWS, Azure, and GCP, as well as bare metal servers for clients with on-premises requirements.
- Build and maintain CI/CD pipelines using tools like GitHub Actions, GitLab CI, Jenkins, ArgoCD, or similar to enable rapid and reliable deployments.
- Implement Infrastructure as Code (IaC) using Terraform, Pulumi, Ansible, CloudFormation, or Bicep to automate provisioning across cloud and bare metal environments.
- Design and manage Kubernetes clusters (EKS, AKS, GKE, and self-hosted k8s on bare metal) for container orchestration, including Helm charts, service mesh, and autoscaling configurations.
- Set up comprehensive monitoring, logging, and alerting using tools like Prometheus, Grafana, Datadog, ELK Stack, or CloudWatch.
- Implement security best practices including secrets management (HashiCorp Vault, AWS Secrets Manager), network security, IAM policies, and compliance frameworks.
- Manage database infrastructure including PostgreSQL, MongoDB, Redis, and cloud-native databases, ensuring backup, recovery, and performance optimization.
- Collaborate with development teams to optimize application deployments, troubleshoot production issues, and improve system reliability.
- Implement disaster recovery strategies, including multi-region deployments, backup automation, and failover mechanisms.
- Automate operational tasks using Python, Bash, or Go to improve efficiency and reduce manual intervention.
What We're Looking For
- 7+ years of experience as a DevOps Engineer, Site Reliability Engineer (SRE), or Cloud Infrastructure Engineer.
- Strong hands-on experience with all three major cloud platforms: AWS, Azure, and GCP.
- Experience managing bare metal servers and on-premises infrastructure, including server provisioning, configuration management (Ansible, Puppet, Chef), and hardware lifecycle management.
- Expert-level knowledge of Kubernetes and container technologies (Docker, containerd, CRI-O), including self-hosted Kubernetes clusters on bare metal.
- Proven experience with Infrastructure as Code tools (Terraform, Pulumi, Ansible, or equivalent) for both cloud and bare metal environments.
- Deep understanding of CI/CD principles and experience building pipelines for complex applications.
- Strong Linux system administration skills and networking fundamentals (TCP/IP, DNS, load balancing, VPNs, VLAN configuration).
- Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, New Relic, or similar).
- Knowledge of security best practices including secrets management, encryption, and compliance (SOC2, ISO 27001, GDPR).
- Proficiency in scripting languages (Python, Bash, Go) for automation.
- Experience with GitOps workflows and tools like ArgoCD or Flux.
- Familiarity with service mesh technologies (Istio, Linkerd) is a plus.
- Excellent problem-solving skills and ability to troubleshoot complex distributed systems.
- Strong communication skills and experience working in agile, cross-functional teams.
Nice to Have
- Experience with AI/ML infrastructure and MLOps pipelines.
- Knowledge of FinOps practices for cloud cost optimization.
- Certifications such as AWS Solutions Architect, Azure Administrator, GCP Professional Cloud Architect, or CKA/CKAD.
- Experience with serverless architectures (Lambda, Azure Functions, Cloud Functions).
- Experience with n8n workflow automation platform is a great plus-we use it extensively for our AI agent deployments.
Why Join Us?
- Work on cutting-edge AI and automation infrastructure projects.
- Shape the infrastructure that powers solutions for top-tier global brands.
- Full ownership from design to implementation-see your architecture decisions go live.
- Based in Lisbon, Portugal-one of Europe's most vibrant tech hubs with hybrid flexibility.
- A chance to grow with a fast-moving, founder-led company at the forefront of enterprise AI orchestration.
- Competitive compensation and benefits package.