Valtech is looking for a Site Reliability Engineer (SRE). Are you passionate about Site Reliability Engineering, do you have an eye for SLIs, SLOs, and automation, do you like eliminating toil, and does it excite you to get things done in close collaboration with people around the globe? Would you like the freedom to work from the comfort of your home and also have the opportunity to visit any of our offices close to you? Then you might be the person we’re looking for! Keep reading to find out.
Over the last years, experience and commerce platforms have drastically evolved into complex ecosystems that tie together multiple services of multiple vendors – also known as the MACH architecture. As a founding member of MACH Alliance, a group that educates enterprises on best-of-breed Microservices, APIs, Cloud, and Headless (MACH) technology, Valtech pioneers in how to properly build and manage those complex ecosystems. Site reliability engineering is at the core of our vision of how this modern-day distributed ecosystem should and can be managed.
As a Site Reliability Engineer (SRE), you fulfil an essential role. You will mainly be responsible for the continuity and reliability of production of the commerce and experience platforms of our clients in continuous collaboration with developers, QA engineers and cloud engineers. You will work with our multidisciplinary teams in an essential DevOps way of working, where your main responsibility is to keep everyone focused on production while creating the facilities to do so.
Your responsibilities will be:
- Monitor performance, availability, and security of applications and services in cloud environments
- Enhance and maintain CI/CD pipelines
- Analyze and troubleshoot issues in development and production environments
- Support teams in testing and improving logging
- Setup and maintain altering systems
- Define and maintain SLOs to ensure system reliability
- Setup backup and disaster recovery process
- Provide proactive support/maintenance
- Collaborate with development teams and provide insights for improvements
You are someone with 2 years of experience in the field of Site Reliability Engineering. Leading up to that, you have gained a profound level of expertise in either cloud engineering, DevOps engineering or software engineering. Taking the lead is something that you feel comfortable doing. In your current role, people come to you for advice on what to look for to determine the robustness of their production environments, advice for reliable deployment procedures, assistance in the analysis of failure scenarios and ideas on how to mitigate or remediate those.
- Good communicative skills, capable of taking the lead and coaching a development team to make the right choices
- Experience with incident management in a production environment of a public-facing online service with high business value and preferably high traffic in a 24x7 fashion
- Experience working in corporate environments
- Experience programming and scripting e.g. Java, Python, etc.
- Knowledge of serverless services in one or more public cloud providers (AWS, Azure, GCP)
- Extensive knowledge of and experience with various monitoring systems, amongst which APM systems such as Datadog, New Relic, Dynatrace, Prometheus, Grafana
- Knowledge of and experience with various pipelining tools, such as GitHub, Azure DevOps, Gitlab, Jenkins
- Knowledge of and experience with microservices-related technology: Docker, Kubernetes
- Good conceptual understanding of software architecture and system thinking
- Worked as an engineer in a DevOps context
- Experience in debugging, optimizing, and proposing changes to application code for scalability, resilience, and monitoring
- Familiar with the automation of routine tasks
- Strong problem-solving skills, effective communication, and a proactive attitude
- An excellent command of English (C1 or above)
Discover our German benefits here
Apart from the country specific benefits listed, we have more to offer. Our growth development program, for example, to help you excel in your ex