AI Fund
Together, let's build great companies that move humanity forward.
About the Company
Baseten provides the necessary infrastructure, tooling, and expertise to launch AI products quickly. Backed by top investors, Baseten supports a wide range of machine learning models and helps organizations deliver scalable, reliable, and efficient solutions.
About the Role
Baseten is seeking a Site Reliability Engineer to build and manage infrastructure that supports the deployment of machine learning models. The ideal candidate will ensure the platform’s scalability, reliability, and performance. This role combines engineering expertise with project management skills to implement technical solutions and improve the user experience for clients.
Responsibilities
- Infrastructure Management: Build and maintain scalable infrastructure for deploying and operating machine learning models.
- Automation & CI/CD: Automate processes, especially related to CI/CD pipelines, to improve efficiency and reduce manual work.
- Project Ownership: Own projects end-to-end, from specification to execution, balancing technical challenges with user empathy.
- Collaboration: Work closely with cross-functional teams to define requirements and translate them into effective technical solutions.
- Mentorship: Mentor junior engineers, contributing to team knowledge sharing and best practices.
- Problem Solving: Navigate ambiguity and make decisions on tool usage and tradeoffs to avoid unnecessary complexity.
- Reliability & Performance: Establish and maintain standards for system reliability and performance.
Required Skills
- Experience: 3+ years of experience in a high-growth, fast-paced environment.
- Technical Proficiency: Extensive experience with Kubernetes and infrastructure-as-code tools like Terraform, CloudFormation, or Pulumi.
- CI/CD: Hands-on experience with CI/CD tooling (GitHub Actions, GitLab CI, Jenkins, CircleCI).
- Observability: Experience with observability tools such as Prometheus, Grafana, ELK stack, and OpenTelemetry.
- Project Management: Ability to take ownership of projects from specification to execution.
- Problem Solving: Strong judgment and the ability to navigate complex technical challenges.
Preferred Qualifications
- Machine Learning: No prior ML experience required, but willingness to learn about ML concepts and processes.
- Networking: Experience with networking concepts and infrastructure optimization.
- Scripting: Experience with Python or shell scripting for automation.
Benefits
- Compensation: Competitive salary with a flexible PTO policy and 401k.
- Health & Wellness: Covered healthcare premiums.
- Work Environment: Fully remote with a collaborative and inclusive company culture.
- Learning & Development: Exposure to various ML startups and networking opportunities.
- Growth Opportunities: Work in an innovative space with professional development and learning resources.