Full Time
Scottsdale, AZ
Posted 11 hours ago

Tap Growth ai

Tap Growth AI is an AI-powered platform that helps recruiters find and hire the right talent faster.

About the Company

A leading technology organization focused on delivering highly reliable and scalable digital solutions seeks to expand its engineering team. With a commitment to innovation and operational excellence, the company ensures high-performance systems and optimal experiences for users across critical platforms. The workplace is located in Scottsdale, United States, offering in-office collaboration and hands-on engagement with cutting-edge infrastructure.

About the Role

The Site Reliability Engineer (SRE) will ensure the availability, scalability, and efficiency of mission-critical systems. This role bridges development and operations, focusing on automation, monitoring, incident response, and infrastructure optimization to maintain high system reliability and performance.

Responsibilities

Design, implement, and maintain monitoring, alerting, and observability solutions.
Automate infrastructure provisioning and deployment pipelines to streamline operations.
Troubleshoot and resolve complex production incidents efficiently.
Analyze system performance, conduct capacity planning, and identify optimizations.
Apply security best practices and maintain disaster recovery procedures.
Collaborate with development teams to design and improve system architecture.

Required Skills

5+ years of experience in Site Reliability Engineering, DevOps, or systems engineering.
Strong proficiency in cloud platforms such as AWS, GCP, or Azure.
Expertise in scripting languages including Python, Bash, or Go.
Hands-on experience with containerization and orchestration tools (Docker, Kubernetes, etc.).
Familiarity with monitoring and observability tools like Prometheus, Grafana, or ELK stack.
Solid understanding of CI/CD pipelines and infrastructure-as-code practices.

Preferred Qualifications

Advanced troubleshooting and analytical problem-solving skills.
Experience with high-availability systems and large-scale production environments.
Knowledge of security practices and disaster recovery strategies.
Familiarity with performance tuning and capacity planning in cloud-based environments.
Experience collaborating with cross-functional teams in dynamic and fast-paced settings.

To learn more about this role, please check the official website listed below:

Apply Now