
Sanrnd
San R&D Business Solutions LLC | Full time
Site Reliability Engineer (SRE) – Azure
Atlanta, United States | Posted on 02/24/2026
We are seeking an experienced Site Reliability Engineer (SRE) with strong expertise in Microsoft Azure and proven experience supporting environments within the Banking or Financial Services industry. This role is responsible for designing and maintaining reliable, scalable, and secure cloud infrastructure while ensuring high availability and optimal performance of mission-critical applications in a regulated environment.
The ideal candidate brings a strong production support mindset and can effectively balance system reliability, automation, and delivery speed.
Key Responsibilities
- Design, deploy, and manage highly available, scalable cloud infrastructure on Microsoft Azure.
- Enhance system reliability, performance, and uptime through automation and proactive monitoring.
- Build, maintain, and optimize CI/CD pipelines for enterprise and cloud-native applications.
- Define, track, and improve SLIs, SLOs, and SLAs.
- Implement and manage observability solutions including logging, monitoring, and alerting.
- Support incident response, perform root cause analysis, and drive post-incident improvements.
- Automate infrastructure provisioning using Infrastructure as Code (IaC) practices.
- Ensure infrastructure and applications comply with banking security and regulatory standards.
- Collaborate closely with DevOps, development, security, and operations teams.
Requirements
Required Skills & Experience
- 7–12 years of experience in Site Reliability Engineering, DevOps, or Production Engineering roles.
- Strong hands‑on experience with Microsoft Azure services including VMs, AKS, App Services, Networking, Storage, and Azure AD.
- Experience with Infrastructure as Code tools such as Terraform, ARM templates, or Bicep.
- Expertise in CI/CD tools such as Azure DevOps, Jenkins, or GitHub Actions.
- Strong scripting skills using PowerShell, Python, or Bash.
- Experience with Docker and Kubernetes container orchestration.
- Prior experience working within Banking or Financial Services environments.
- Solid understanding of security, compliance, and risk management in regulated industries.
Preferred Qualifications
- Experience with monitoring and observability tools such as Azure Monitor, Prometheus, Grafana, or Splunk.
- Knowledge of high availability and disaster recovery architectures.
- Familiarity with ITIL processes and incident management frameworks.
- Microsoft Azure certifications such as AZ-104, AZ-400, or equivalent are a plus.
#J-18808-Ljbffr