Collaborating with the world's most innovative companies to build enduring brand and business value
About the Company
LiveRamp is a leading data collaboration platform that enables global organizations to activate, connect, and manage first-party data while upholding consumer privacy and data ethics. Trusted by top brands, tech companies, banks, retailers, and healthcare leaders, the platform helps maximize the value of data, improve customer engagement, and maintain compliance with evolving privacy standards. LiveRamp operates globally, offering flexible collaboration across organizations and its premier network of partners.
About the Role
The Site Reliability Engineer (SRE) will ensure the reliability, scalability, and performance of global products, supporting deployment, internal operations, and infrastructure. This role involves implementing SRE best practices, automating CI/CD pipelines, optimizing Kubernetes environments, and collaborating with distributed engineering teams across multiple regions.
Responsibilities
- Deploy and manage global products, including production and internal environments.
- Provide 24/7 engineering support across follow-the-sun teams for system availability and operational issues.
- Collaborate with engineering teams to resolve core product issues efficiently.
- Maintain and enhance monitoring, alerting, and observability systems.
- Develop, maintain, and optimize CI/CD pipelines and Terraform scripts.
- Maintain and improve engineering operational documentation and SRE practices.
- Optimize performance and cost of systems, including Kubernetes container rightsizing.
- Collaborate closely with global SRE and engineering teams in regions including California, Paris, Nantong, Singapore, and Australia.
- Work with real-time and NoSQL databases such as SingleStore, ScyllaDB, Cassandra, or DynamoDB.
Required Skills
- 5+ years of experience in SRE, DevOps, or production engineering.
- Expertise in Infrastructure as Code (IaC) using Terraform.
- Experience building continuous integration pipelines in Jenkins or CircleCI.
- Hands-on experience with Kubernetes, containers, and public cloud platforms (AWS or GCP).
- Proven ability to deploy and monitor highly scalable products.
- Knowledge of FinOps and autoscaling Kubernetes clusters.
- Proficiency in Python or Go.
- Familiarity with SRE best practices and observability principles.
- Strong problem-solving, debugging, and automation skills.
- Experience securing systems in a public cloud environment.
- Ability to mentor and guide other engineers in SRE practices.
Preferred Qualifications
- Experience with high-availability systems in production environments.
- Knowledge of disaster recovery procedures and performance optimization.
- Strong communication skills to engage stakeholders effectively.
- Experience working in distributed, collaborative global teams.
- Understanding of cloud security, compliance, and governance.