Principal Kubernetes DevOps Architect – Global Scale

Full Time
San Francisco
Posted 3 days ago
150.000000 - 200.000000

Zoom

Requirements

We are seeking a Principal Kubernetes DevOps Engineer who combines deep technical expertise with broad system understanding

,

This engineer should be capable of diving into a wide range of services and identifyingsystemic issues across architecture, CI/CD flow, and containerization environments

,

This role requires technical leadership, analytical skill, and cross-team collaboration to drive reliability, scalability, and modernization

,

15+ years in DevOps, SRE for large-scale, production systems. successful hands-on background in Linux systems, networking, and distributed systems

,

Possess experience operating and design low-latency, high-throughput backend services at global scale. Knowledge of media or real-time communication systems (e.g., MMR, WebRTC)

,

Recognize knowledge of TCP/IP, routing, DNS, load balancing, and packet capture tools. Familiarity with colocation data center operations, including hardware provisioning and troubleshooting

,

Demonstrate experience with Terraform, Ansible, Kubernetes, Docker, and modern CI/CD pipelines. successful problem-solving, debugging, and systems-level design skills

,

Occasional weekend work may be required

,

Ability to work across the globe or multiple time zones

What the job involves

At Zoom, we’re building the next generation of Cloud and Colocation (Colo) infrastructure that powers seamless communication and collaboration for millions of users worldwide

,

Leading deep-dive investigations across diverse services and environments. Working on real time media systems to web, team chat and AI to uncover architectural or operational bottlenecks

,

Designing and implementing improvements in deployment pipelines, orchestration frameworks, andCI/CD automation to increase reliability and release velocity

,

Working closely with product and service owners to enhance containerization strategy, improve resource efficiency, and reduce operational friction

,

Partnering with the Meeting DevOps and Cloud Infra teams to modernize hybrid infrastructures panning colocation data centers, AWS, OCI, and other cloud providers

,

Driving system observability, fault isolation, and resilience engineering, ensuring services meet strict availability and latency SLAs

,

Providing technical mentorship to DevOps engineers and influence best practices in automation, monitoring, and release engineering. Champion a culture of data-driven reliability through postmortems, SLIs/ SLO’s, and continuous performance optimization

#J-18808-Ljbffr