Cloud Site Reliability Engineer (SRE)

  • Full Time
  • Alpharetta
  • 0.000000 - 0.000000
Insight Global



Job Description

This role is primarily an operations incident response role for cloud issues in AWS, Azure and GPC and includes cloud infrastructure management. This role will troubleshoot and performance analysis from a cloud perspective with the goal of reducing time to resolution. This role is focused on responding as well as strategizing and designing a solution to prevent incidents from happening in the future in the Cloud environment. They will collaborate with the NOC, Network engineering teams, platform teams and application support teams in addition to working with the cloud provider.

Our goal is to modernize and stabilize our infrastructure. As we get pulled into incidents and issues, we want to resolve the issues quickly then address solving this and preventing.

We are seeking a Cloud Site Reliability Engineer (SRE) to drive the reliability, scalability, and performance of our cloud-based infrastructure. The ideal candidate combines software engineering expertise with advanced systems operations skills to maintain highly available systems while reducing operational toil. This role involves automation, monitoring, capacity planning, incident response, and cloud platform management across a dynamic, distributed environment.

As a Cloud SRE, you will work closely with Engineering, Architecture, DevOps, and security teams to ensure seamless service experiences for our customers while contributing to platform design and operational efficiency.



Contract / Contract-to-Hire Roles:


Compensation:

$75/hr to $80/hr.

Exact compensation may vary based on several factors, including skills, experience, and education.

Employees in this role will enjoy a comprehensive benefits package starting on day one of employment, including options for medical, dental, and vision insurance. Eligibility to enroll in the 401(k) retirement plan begins after 90 days of employment. Additionally, employees in this role will have access to paid sick leave and other paid time off benefits as required under the applicable law of the worksite location.



We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters.

Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to [email protected]. To learn more about how we collect, keep, and process your private information, please review Insight Global’s Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Skills and Requirements

experience Azure, AWS, or GCP (experience in 2 of the 3 cloud platforms) (# of years doesn’t matter, needs to be a person who can think through complex issues)

Experience with Splunk

terraform experience for Infrastructure as Code (IaC) for Cloud Infrastructure Management: Deploy, manage, and optimize cloud resources


Python, PowerShell, Bash, or equivalent for automation and system management. (one scripting language is fine)

VPCs, IAM, serverless architectures

very collaborative team – work with other teams – platforms, engineering, networking. How do we resolve this in other areas.

Scaling, sizability, performance.

very strong with infrastructure and system analyst in the cloud


problem solver – ability to get to bottom of issues to quickly remediate problems and then also think about how to fix it for good/for the future.


Self-starter

Ability to work in a high pressure environment.

automation, monitoring, capacity planning, incident response, and cloud platform management across a dynamic, distributed environment.


System Reliability & Availability: Design and maintain fault-tolerant, high-availability architectures across AWS, Azure, and GCP.

Implement redundancy, load balancing, and automated failover strategies.

On call rotation every 5-6 weeks

Must be able to be onsite 5 days a week


Our Engineers play a critical role in the success of our clients and are expected to effectively communicate our recommended solutions in a consultative role for each client. Therefore, a successful candidate will possess a high degree of self-management, personal accountability, strong communication skills, and teamwork. The ability to interact, engineer, and communicate collaboratively at the highest technical levels with customers, vendors, partners, and all members of staff is required. Moogsoft to automate (like disk latency and CPU Utilization)

Dynatrace – reading logs

Containers & Orchestration: Experience with Docker and Kubernetes.

Cloud FinOps and utilization experience


Ansible playbooks (strong plus)


#J-18808-Ljbffr

Copyright © 2026 SRE-Jobs.com. All Rights Reserved.