Full Time
Alpharetta
Posted 21 hours ago

Spectraforce Technologies

Title: Site Reliability Engineer II
Location: Alpharetta, GA (3 days a week onsite)
Duration: 6 months

Job Description:
We are seeking a skilled Site Reliability Engineer to join our team and help build, maintain, and scale our cloud-native infrastructure. You will work closely with development and operations teams to ensure our systems are reliable, scalable, and efficient. The ideal candidate is passionate about automation, observability, and infrastructure-as-code, and thrives in a collaborative, fast-paced environment.

Key Responsibilities

* Design, implement, and manage cloud infrastructure on Azure using Terraform and Terragrunt.

* Maintain and optimize Kubernetes clusters on Azure Kubernetes Service (AKS).

* Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD for GitOps deployments.

* Enhance system reliability by implementing monitoring, alerting, and observability solutions with Grafana.

* Automate operational tasks to reduce toil and improve team efficiency.

* Participate in on-call rotations, incident response, and post-mortem analysis.

* Collaborate with development teams to improve application performance, scalability, and resilience.

* Implement and advocate for SRE best practices, including SLIs, SLOs, and error budgets.

* Continuously improve system performance, cost efficiency, and security.

Required Skills & Qualifications

* 3+ years of experience in an SRE, DevOps, or cloud infrastructure role.

* Strong experience with Azure cloud services and infrastructure.

* Hands-on experience with java and Terraform and Terragrunt for infrastructure-as-code.

* Proficiency with Kubernetes (preferably AKS and container orchestration.

* Experience with CI/CD tools, especially GitHub Workflows/Actions and ArgoCD.

* Solid understanding of observability tools like Grafana (Prometheus, Loki, Tempo experience is a plus).

Education Requirements Bachelor’s degree required, (Masters preferred)

Apply Now