Sr. Site Reliability Engineer

SHEIN Distribution Corporation

Making the Beauty of Fashion Accessible to All.

About SHEIN

SHEIN is a worldwide online fashion and lifestyle store offering its own branded clothing as well as products from various global suppliers, all at affordable prices. The company is based in Singapore and has over 15,000 employees working from offices worldwide. SHEIN focuses on making fashion accessible to everyone by using advanced, on-demand production techniques to create a smarter, future-ready industry.

Job Overview

We are seeking an experienced Senior Site Reliability Engineer (Senior Site Reliability Engineer I) to join our Site Reliability Engineering (SRE) team. SREs at SHEIN are a mix of software and systems engineers focused on keeping our production systems running continuously without downtime. They work to build highly reliable and efficient systems.

SREs collaborate with different teams to develop tools that help monitor, analyze, and alert on system operations. They aim to spot problems early and fix them quickly. They also work on improving the system’s performance, resource use, and stability. The role involves managing key open-source software that supports our platform and playing a central role in major engineering projects.

The goal is to enhance platform reliability, reduce the number of system incidents, and shorten the time it takes to resolve them. The team uses software development, networking, and systems engineering skills to handle complex, large-scale challenges and improve the service for our users.

Key Responsibilities

  • Participate in 24/7 on-call duty to keep the production system available at all times.

  • Monitor system capacity and usage, coordinating with other teams to scale services up or down as needed.

  • Manage and operate essential open-source tools like Elasticsearch, Kafka, RabbitMQ, and Redis.

  • Create tools and processes that improve system monitoring and reliability.

  • Handle and reduce downtime caused by system issues.

  • Work with service owners to define and monitor key service performance metrics.

  • Develop best practices for monitoring and deploying backend features.

  • Maintain technical documentation and runbooks.

  • Lead projects to improve the platform’s efficiency and keep it up to date with best practices.

  • Quickly respond to production problems using software, systems, and networking knowledge to prevent repeated issues.

  • Help reduce manual work through automation and improve system scalability and performance.

Qualifications

  • Bachelor’s degree in Computer Science, Information Systems, or a related field.

  • Over 5 years of experience supporting critical, real-time applications in a 24/7 production environment, especially in cloud settings.

  • Strong problem-solving skills with a proactive attitude and sense of responsibility.

  • Experience with debugging and optimizing performance across full systems, including cloud infrastructure, CI/CD pipelines, Java, SQL, and NoSQL databases.

  • Skilled in monitoring tools like Grafana, Prometheus, or Zabbix.

  • Programming experience in languages like Python or Go.

  • Familiarity with container technologies such as Docker and Kubernetes.

  • Good communication skills and ability to work with remote teams.

  • Experience with open-source software like Elasticsearch, Kafka, and Redis.

  • Solid understanding of SRE principles, including automation and reducing manual work.

Preferred Skills

  • Experience with big data technologies (e.g., Hadoop, Spark).

  • Strong Linux knowledge.

  • Fluent in Mandarin and English.

Benefits

  • Bonus and stock options.

  • Medical, dental, vision insurance, and prescription coverage.

  • Health savings and flexible spending accounts.

  • Company-paid life and disability insurance.

  • Various voluntary insurance options.

  • Employee assistance program.

  • Travel accident insurance.

  • 401(k) plan with company match and financial advice.

  • Paid vacation, holidays, sick days, and floating holidays.

  • Employee discounts.

  • Free weekly catered lunch.

  • Dog-friendly offices at some locations.

  • Free gym access at select sites.

  • Company swag and giveaways.

  • Annual holiday party and other events.

  • Free daily snacks and drinks.

Salary Range: $107,600 – $180,200 USD

Copyright © 2025 SRE-Jobs.com. All Rights Reserved.