About Avanai

Avanai helps enterprises reimagine work with AI agents and automation that deliver measurable results, not surprises. We design, build, and run enterprise-grade AI agents and automated workflows that accelerate decision-making, cut costs, ensure compliance, and unlock ROI in the first year.

The Role

We're looking for a Senior DevOps Engineer with deep expertise in multi-cloud environments (AWS, Azure, and GCP) as well as bare metal infrastructure to architect, build, and maintain our infrastructure. You'll play a critical role in ensuring our AI-powered platforms are scalable, secure, and highly available-whether deployed in the cloud or on client-owned on-premises servers. This is a hands-on role where you'll design infrastructure as code for hybrid environments, build robust CI/CD pipelines, and implement best practices for both cloud-native and bare metal deployments.

What You'll Do

Design, implement, and manage cloud infrastructure across AWS, Azure, and GCP, as well as bare metal servers for clients with on-premises requirements.
Build and maintain CI/CD pipelines using tools like GitHub Actions, GitLab CI, Jenkins, ArgoCD, or similar to enable rapid and reliable deployments.
Implement Infrastructure as Code (IaC) using Terraform, Pulumi, Ansible, CloudFormation, or Bicep to automate provisioning across cloud and bare metal environments.
Design and manage Kubernetes clusters (EKS, AKS, GKE, and self-hosted k8s on bare metal) for container orchestration, including Helm charts, service mesh, and autoscaling configurations.
Set up comprehensive monitoring, logging, and alerting using tools like Prometheus, Grafana, Datadog, ELK Stack, or CloudWatch.
Implement security best practices including secrets management (HashiCorp Vault, AWS Secrets Manager), network security, IAM policies, and compliance frameworks.
Manage database infrastructure including PostgreSQL, MongoDB, Redis, and cloud-native databases, ensuring backup, recovery, and performance optimization.
Collaborate with development teams to optimize application deployments, troubleshoot production issues, and improve system reliability.
Implement disaster recovery strategies, including multi-region deployments, backup automation, and failover mechanisms.
Automate operational tasks using Python, Bash, or Go to improve efficiency and reduce manual intervention.

What We're Looking For

7+ years of experience as a DevOps Engineer, Site Reliability Engineer (SRE), or Cloud Infrastructure Engineer.
Strong hands-on experience with all three major cloud platforms: AWS, Azure, and GCP.
Experience managing bare metal servers and on-premises infrastructure, including server provisioning, configuration management (Ansible, Puppet, Chef), and hardware lifecycle management.
Expert-level knowledge of Kubernetes and container technologies (Docker, containerd, CRI-O), including self-hosted Kubernetes clusters on bare metal.
Proven experience with Infrastructure as Code tools (Terraform, Pulumi, Ansible, or equivalent) for both cloud and bare metal environments.
Deep understanding of CI/CD principles and experience building pipelines for complex applications.
Strong Linux system administration skills and networking fundamentals (TCP/IP, DNS, load balancing, VPNs, VLAN configuration).
Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, New Relic, or similar).
Knowledge of security best practices including secrets management, encryption, and compliance (SOC2, ISO 27001, GDPR).
Proficiency in scripting languages (Python, Bash, Go) for automation.
Experience with GitOps workflows and tools like ArgoCD or Flux.
Familiarity with service mesh technologies (Istio, Linkerd) is a plus.
Excellent problem-solving skills and ability to troubleshoot complex distributed systems.
Strong communication skills and experience working in agile, cross-functional teams.

Nice to Have

Experience with AI/ML infrastructure and MLOps pipelines.
Knowledge of FinOps practices for cloud cost optimization.
Certifications such as AWS Solutions Architect, Azure Administrator, GCP Professional Cloud Architect, or CKA/CKAD.
Experience with serverless architectures (Lambda, Azure Functions, Cloud Functions).
Experience with n8n workflow automation platform is a great plus-we use it extensively for our AI agent deployments.

Work Environment & Culture

Important Note on "Hybrid" Work: While we describe this as a hybrid position, as a young, fast-growing company, we prioritize in-person collaboration and team building. Our CTO is based in our Berlin office, and having our technical team physically present is essential for knowledge sharing, mentorship, and maintaining our innovative culture. We expect team members to be in the office most days of the week, with occasional flexibility for remote work. We organize frequent team activities, collaborative sessions, and learning opportunities that require physical presence. This is fundamentally an office-based role with some remote flexibility, not a primarily remote position.

Why Join Us?

Work on cutting-edge AI and automation infrastructure projects.
Shape the infrastructure that powers solutions for top-tier global brands.
Full ownership from design to implementation-see your architecture decisions go live.
Collaborate directly with our CTO and technical leadership team in our Berlin office.
Be part of building our company culture and technical expertise through in-person collaboration.
A chance to grow with a fast-moving, founder-led company at the forefront of enterprise AI orchestration.
Competitive compensation and benefits package.

Senior DevOps Engineer