Site Reliability Engineer (SRE) - Database as a Service (DBaaS) in Home Office bei Qdrant
Site Reliability Engineer (SRE) - Database as a Service (DBaaS) in Home Office bei Qdrant
Qdrant is an Open-Source Vector Database.
We help businesses take advantage of modern AI technologies. We are developing neural search solutions that allow everyone to use state-of-the-art neural network encoders at the production scale. At the same time, we help companies to integrate our technology into their infrastructure. Our flagship product is the open-source Vector Database: https://github.com/qdrant/qdrant
Among the technical challenges, we are facing is the implementation of our cloud infrastructure to serve our engine as a scalable cloud API solution. We are looking for a Site Reliability Engineer to ensure stable and secure operability of our managed solutions. If you're passionate about Site Reliability Engineering, Python, Go, Kubernetes, and contributing to the growth of a cutting-edge Database as a Service, we want to hear from you! Apply now and become a key player in shaping the reliability and scalability of our DBaaS platform.
Tasks
- nfrastructure Automation: Design, implement, and manage infrastructure code using Terraform, focusing on the reliability and scalability of our Database as a Service (DBaaS) platform.
- Programming Mastery: Utilize Python and Go to improve our service quality and develop automation scripts and tools for monitoring, deployment, and maintenance tasks specific to database operations.
- Kubernetes Expertise: Demonstrate a deep understanding of Kubernetes, ensuring optimal performance, scalability, and reliability for our DBaaS platform.
- Operator Frameworks: Develop and maintain Kubernetes Operators for automating database platform operations, enhancing the reliability of our services.
- Multi-Cloud Management: Architect and maintain infrastructure in multi-cloud environments (AWS, GCP, Azure) to provide a resilient and available DBaaS solution.
- Monitoring and Incident Response: Implement effective monitoring solutions tailored for database services and collaborate on incident response procedures to maintain the high availability of our systems.
- Service Level Objectives (SLOs) and Agreements (SLAs): Define, measure, and maintain SLOs and SLAs specific to database performance and reliability, actively monitoring and optimizing systems to meet these targets.
Requirements
- Site Reliability Engineering Focus: Proven experience in a Site Reliability Engineering or similar role, with a strong emphasis on database systems.
- Programming Languages: Proficiency in Python and Go; experience with other languages is a plus.
- Kubernetes Skills: Proven hands-on experience managing and optimizing Kubernetes clusters, particularly in the context of database services.
- Operator Frameworks: Strong background in developing and maintaining Kubernetes Operators, with a focus on database automation.
- Infrastructure as Code (IaC): Solid understanding and experience with Terraform, Ansible, or Pulumi, specifically applied to database infrastructure.
- Multi-Cloud Expertise: Experience working with multi-cloud environments (AWS, GCP, Azure), ensuring seamless database operations across platforms.
- Container Orchestration: Deep understanding of containerization concepts and orchestration tools (Docker, Kubernetes) within the DBaaS context.
- SLOs and SLAs: Demonstrated experience in defining, implementing, and meeting Service Level Objectives and Agreements, particularly in the context of database reliability.
- Problem Solving: Strong analytical and problem-solving skills, with a keen attention to detail.
- Communication Skills: Excellent communication and collaboration skills, with the ability to convey complex technical concepts to diverse audiences.
Benefits
- Working in a passionate international team
- Competitive salary plus perks
- Flexible working hours
- Company events
- Choose any hardware
- Remote first/home office
- Relocation option