Principal Kubernetes DevOps Architect – Global Scale

  • Full Time
  • San Francisco
  • 150.000000 - 200.000000
Zoom


Requirements

  • We are seeking a Principal Kubernetes DevOps Engineer who combines deep technical expertise with broad system understanding

  • ,

  • This engineer should be capable of diving into a wide range of services and identifyingsystemic issues across architecture, CI/CD flow, and containerization environments

  • ,

  • This role requires technical leadership, analytical skill, and cross-team collaboration to drive reliability, scalability, and modernization

  • ,

  • 15+ years in DevOps, SRE for large-scale, production systems. successful hands-on background in Linux systems, networking, and distributed systems

  • ,

  • Possess experience operating and design low-latency, high-throughput backend services at global scale. Knowledge of media or real-time communication systems (e.g., MMR, WebRTC)

  • ,

  • Recognize knowledge of TCP/IP, routing, DNS, load balancing, and packet capture tools. Familiarity with colocation data center operations, including hardware provisioning and troubleshooting

  • ,

  • Demonstrate experience with Terraform, Ansible, Kubernetes, Docker, and modern CI/CD pipelines. successful problem-solving, debugging, and systems-level design skills

  • ,

  • Occasional weekend work may be required

  • ,

  • Ability to work across the globe or multiple time zones

What the job involves

  • At Zoom, we’re building the next generation of Cloud and Colocation (Colo) infrastructure that powers seamless communication and collaboration for millions of users worldwide

  • ,

  • Leading deep-dive investigations across diverse services and environments. Working on real time media systems to web, team chat and AI to uncover architectural or operational bottlenecks

  • ,

  • Designing and implementing improvements in deployment pipelines, orchestration frameworks, andCI/CD automation to increase reliability and release velocity

  • ,

  • Working closely with product and service owners to enhance containerization strategy, improve resource efficiency, and reduce operational friction

  • ,

  • Partnering with the Meeting DevOps and Cloud Infra teams to modernize hybrid infrastructures panning colocation data centers, AWS, OCI, and other cloud providers

  • ,

  • Driving system observability, fault isolation, and resilience engineering, ensuring services meet strict availability and latency SLAs

  • ,

  • Providing technical mentorship to DevOps engineers and influence best practices in automation, monitoring, and release engineering. Champion a culture of data-driven reliability through postmortems, SLIs/ SLO’s, and continuous performance optimization


#J-18808-Ljbffr

Copyright © 2026 SRE-Jobs.com. All Rights Reserved.