
Zoom
Requirements
- We are seeking a Principal Kubernetes DevOps Engineer who combines deep technical expertise with broad system understanding
- This engineer should be capable of diving into a wide range of services and identifyingsystemic issues across architecture, CI/CD flow, and containerization environments
- This role requires technical leadership, analytical skill, and cross-team collaboration to drive reliability, scalability, and modernization
- 15+ years in DevOps, SRE for large-scale, production systems. successful hands-on background in Linux systems, networking, and distributed systems
- Possess experience operating and design low-latency, high-throughput backend services at global scale. Knowledge of media or real-time communication systems (e.g., MMR, WebRTC)
- Recognize knowledge of TCP/IP, routing, DNS, load balancing, and packet capture tools. Familiarity with colocation data center operations, including hardware provisioning and troubleshooting
- Demonstrate experience with Terraform, Ansible, Kubernetes, Docker, and modern CI/CD pipelines. successful problem-solving, debugging, and systems-level design skills
- Occasional weekend work may be required
- Ability to work across the globe or multiple time zones
,
,
,
,
,
,
,
,
What the job involves
- At Zoom, we’re building the next generation of Cloud and Colocation (Colo) infrastructure that powers seamless communication and collaboration for millions of users worldwide
- Leading deep-dive investigations across diverse services and environments. Working on real time media systems to web, team chat and AI to uncover architectural or operational bottlenecks
- Designing and implementing improvements in deployment pipelines, orchestration frameworks, andCI/CD automation to increase reliability and release velocity
- Working closely with product and service owners to enhance containerization strategy, improve resource efficiency, and reduce operational friction
- Partnering with the Meeting DevOps and Cloud Infra teams to modernize hybrid infrastructures panning colocation data centers, AWS, OCI, and other cloud providers
- Driving system observability, fault isolation, and resilience engineering, ensuring services meet strict availability and latency SLAs
- Providing technical mentorship to DevOps engineers and influence best practices in automation, monitoring, and release engineering. Champion a culture of data-driven reliability through postmortems, SLIs/ SLO’s, and continuous performance optimization
,
,
,
,
,
,
#J-18808-Ljbffr