About the Company

Owner is a rapidly growing restaurant-commerce platform, powering the ordering flows, payments, and mobile apps that thousands of restaurants and millions of diners rely on daily. The company is on a mission to build resilient, high-performance systems that ensure reliable and seamless experiences for customers. With an innovative tech stack and a focus on scalability, Owner provides a crucial platform for the restaurant industry.

About the Role

As a Senior Site Reliability Engineer (SRE)/DevOps Engineer at Owner, you will help ensure the platform’s systems are reliable, performant, and resilient. This hybrid role will involve working with development teams to design systems for uptime, optimize deployment pipelines, and handle cloud infrastructure. You will help maintain and improve high-availability systems while automating processes and enhancing security across platforms.

Responsibilities

Design for Reliability: Set SLOs/SLIs, build self-healing architectures, and drive incident-prevention efforts to keep APIs and real-time ordering flows under 100 ms p95.
Own Observability: Improve dashboards, alerts, and distributed tracing to detect issues proactively.
Automate Deployments: Evolve Buildkite pipelines and Terraform modules for quick rollouts and clean rollbacks.
Champion Security & Compliance: Harden infrastructure, manage IAM privileges, and guide SOC 2 / PCI efforts.
Partition & Scale Data-Stores: Optimize Postgres for multi-TB workloads, manage Mongo sharding, and handle Kafka topic management.
Lead Incident Response: Participate in on-call rotations, conduct post-mortems, and implement actionable fixes.
Mentor & Collaborate: Work with engineers on capacity reviews, guide junior devs on Docker best practices, and promote “you build it, you run it” culture.

Required Skills

5+ years of experience in running production workloads on AWS (or GCP/Azure) with infrastructure-as-code tools like Terraform/CDK/CloudFormation.
Hands-on experience with container orchestration (ECS, EKS, Kubernetes, Nomad).
Deep knowledge of at least two core datastores (Postgres, MongoDB, Kafka), including backups, upgrades, and performance tuning.
Fluency in CI/CD pipeline management (Buildkite, GitHub Actions), and automation using shell, Python, or TypeScript.
Experience setting up monitoring and alerting using Datadog, Prometheus, or similar tools.
Strong knowledge of Linux networking, load balancing (Cloudflare/ELB), and CDN/edge security.
Proven ability in incident management, root cause analysis, and action item follow-up.
A passion for customer-centric thinking and continuous learning.

Preferred Qualifications

Experience with NestJS or other Node.js backends at scale.
Familiarity with PCI-DSS or SOC 2 compliance environments.
Exposure to GitOps workflows (Argo CD, Flux).
Experience with mobile CI (React Native pipelines), LaunchDarkly/feature-flags, or chaos-engineering.

Benefits

100% remote work across the U.S. or Canada (option to work from the SF office).
Comprehensive health, dental, and vision coverage.
Home-office stipend, top-tier laptop, and required tools.
Twice-annual team off-sites.
Flexible working hours and generous paid time off.

Find the complete job listing and details on the official website mentioned below:

Apply Now