Senior DevOps Engineer

📍 Berlin, Germany (Hybrid) Full-time Infrastructure

About Avanai

Avanai helps enterprises reimagine work with AI agents and automation that deliver measurable results, not surprises. We design, build, and run enterprise-grade AI agents and automated workflows that accelerate decision-making, cut costs, ensure compliance, and unlock ROI in the first year.

The Role

We're looking for a Senior DevOps Engineer with deep expertise in multi-cloud environments (AWS, Azure, and GCP) as well as bare metal infrastructure to architect, build, and maintain our infrastructure. You'll play a critical role in ensuring our AI-powered platforms are scalable, secure, and highly available-whether deployed in the cloud or on client-owned on-premises servers. This is a hands-on role where you'll design infrastructure as code for hybrid environments, build robust CI/CD pipelines, and implement best practices for both cloud-native and bare metal deployments.

What You'll Do

  • Design, implement, and manage cloud infrastructure across AWS, Azure, and GCP, as well as bare metal servers for clients with on-premises requirements.
  • Build and maintain CI/CD pipelines using tools like GitHub Actions, GitLab CI, Jenkins, ArgoCD, or similar to enable rapid and reliable deployments.
  • Implement Infrastructure as Code (IaC) using Terraform, Pulumi, Ansible, CloudFormation, or Bicep to automate provisioning across cloud and bare metal environments.
  • Design and manage Kubernetes clusters (EKS, AKS, GKE, and self-hosted k8s on bare metal) for container orchestration, including Helm charts, service mesh, and autoscaling configurations.
  • Set up comprehensive monitoring, logging, and alerting using tools like Prometheus, Grafana, Datadog, ELK Stack, or CloudWatch.
  • Implement security best practices including secrets management (HashiCorp Vault, AWS Secrets Manager), network security, IAM policies, and compliance frameworks.
  • Manage database infrastructure including PostgreSQL, MongoDB, Redis, and cloud-native databases, ensuring backup, recovery, and performance optimization.
  • Collaborate with development teams to optimize application deployments, troubleshoot production issues, and improve system reliability.
  • Implement disaster recovery strategies, including multi-region deployments, backup automation, and failover mechanisms.
  • Automate operational tasks using Python, Bash, or Go to improve efficiency and reduce manual intervention.

What We're Looking For

  • 7+ years of experience as a DevOps Engineer, Site Reliability Engineer (SRE), or Cloud Infrastructure Engineer.
  • Strong hands-on experience with all three major cloud platforms: AWS, Azure, and GCP.
  • Experience managing bare metal servers and on-premises infrastructure, including server provisioning, configuration management (Ansible, Puppet, Chef), and hardware lifecycle management.
  • Expert-level knowledge of Kubernetes and container technologies (Docker, containerd, CRI-O), including self-hosted Kubernetes clusters on bare metal.
  • Proven experience with Infrastructure as Code tools (Terraform, Pulumi, Ansible, or equivalent) for both cloud and bare metal environments.
  • Deep understanding of CI/CD principles and experience building pipelines for complex applications.
  • Strong Linux system administration skills and networking fundamentals (TCP/IP, DNS, load balancing, VPNs, VLAN configuration).
  • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, New Relic, or similar).
  • Knowledge of security best practices including secrets management, encryption, and compliance (SOC2, ISO 27001, GDPR).
  • Proficiency in scripting languages (Python, Bash, Go) for automation.
  • Experience with GitOps workflows and tools like ArgoCD or Flux.
  • Familiarity with service mesh technologies (Istio, Linkerd) is a plus.
  • Excellent problem-solving skills and ability to troubleshoot complex distributed systems.
  • Strong communication skills and experience working in agile, cross-functional teams.

Nice to Have

  • Experience with AI/ML infrastructure and MLOps pipelines.
  • Knowledge of FinOps practices for cloud cost optimization.
  • Certifications such as AWS Solutions Architect, Azure Administrator, GCP Professional Cloud Architect, or CKA/CKAD.
  • Experience with serverless architectures (Lambda, Azure Functions, Cloud Functions).
  • Experience with n8n workflow automation platform is a great plus-we use it extensively for our AI agent deployments.

Why Join Us?

  • Work on cutting-edge AI and automation infrastructure projects.
  • Shape the infrastructure that powers solutions for top-tier global brands.
  • Full ownership from design to implementation-see your architecture decisions go live.
  • Based in Berlin, Germany with hybrid flexibility-work from our office or remotely from the city.
  • A chance to grow with a fast-moving, founder-led company at the forefront of enterprise AI orchestration.
  • Competitive compensation and benefits package.

Apply Now

Ready to join us? Fill out the form below and we'll be in touch.

⚠️

Important: This position is remote within Berlin, Germany - it is not a fully remote role from anywhere in the world. You must either be currently living in Berlin or be willing to relocate. Please note that we do not sponsor visas at this time; any relocation and visa arrangements must be handled independently by the applicant.