We Power the Blockchain Economy. The complete institutional solution for seamless, secure, and efficient Web3 services.
About the Company
Blockdaemon drives the blockchain economy with a comprehensive suite of infrastructure solutions. Operating globally with ISO-27001 certification, the company offers technical depth, industry-leading SLAs, and 24/7 support across 70+ points of presence via 10+ cloud and bare metal providers. Its platform serves exchanges, custodians, crypto platforms, financial institutions, and developers through dedicated nodes, APIs, staking, liquid staking, and MPC technologies, enabling secure, compliant, and scalable blockchain operations.
About the Role
The Site Reliability Engineer (SRE) will support Blockdaemon’s multi-cloud infrastructure across GCP, AWS, and Azure. The role focuses on designing, deploying, and maintaining highly available and resilient systems, ensuring performance and reliability at scale. This position combines automation, systems architecture, and incident management to maintain robust infrastructure for blockchain services.
Responsibilities
- Design and implement scalable, highly available, and resilient systems in collaboration with engineering teams.
- Manage and provision multi-cloud infrastructure using Infrastructure-as-Code tools like Terraform and Helm.
- Develop automation scripts and tools to streamline deployment, monitoring, and incident response.
- Configure monitoring and alerting systems to proactively detect and mitigate issues.
- Respond to critical incidents, perform root cause analysis, and implement preventive measures; participate in on-call rotation as needed.
- Analyze performance metrics, identify bottlenecks, and optimize system efficiency.
- Collaborate with security teams to enforce best practices in access control, data protection, and regulatory compliance.
- Document configurations, procedures, and troubleshooting steps, and share knowledge to promote continuous improvement.
Required Skills
- Proven experience with cloud platforms (GCP, AWS, Azure) and Infrastructure-as-Code tooling (Terraform, Helm).
- Hands-on experience with CI/CD orchestration platforms such as GitLab CI, ArgoCD, Github Actions, or similar GitOps workflows.
- Strong analytical and problem-solving skills with the ability to troubleshoot complex issues independently.
- Excellent communication and collaboration skills for effective work in cross-functional teams.
- Strong architectural and security mindset.
Preferred Qualifications
- Advanced knowledge of Linux/Unix administration and networking concepts.
- Experience with monitoring tools such as Prometheus and Grafana.
- 5+ years maintaining infrastructure-as-code in multi-cloud environments.
- Familiarity with SOC 2 Type 1 and Type 2 compliance frameworks.
- Proficiency in scripting or programming (Bash, Go, Python, TypeScript).
- Hands-on experience with highly available Kubernetes clusters (2+ years).
- Background in incident management and resolution.
- Exposure to AI development tools and related security considerations.
- Passion for blockchain technology and decentralized systems.
- Experience with blockchain infrastructure, either professionally or personally.