Can be remote/hybrid
We're looking for a Site Reliability Engineer to join our Foundations team! You will be working alongside talented individuals grouped in small agile teams. The Foundations Team is composed of 4 squads: Core Infrastructure, Database Reliability, Engineering Productivity and Site Reliability Engineering. Together, they offer what we call the Software Factory, a portfolio of services helping BlaBlaCar products to offer more services to its members reliably and fast.
The Site Reliability Engineering team (SRE) is responsible for providing best-in-class Observability tools to service teams. As an enabling team, we help BlaBlaCar engineers efficiently improve their service reliability. Empowering developers and bringing them our reliability expertise is at the core of our daily work.
- Make the Datadog migration (Unified Observability project) a success
- Maintain our Datadog setup, providing at the same place all Logs, Traces and Metrics from all services for the utmost benefit of engineers
- Provide expertise and advice to service teams to help them get the most added value from their Observability
- Participate in SRE on-call schedule
- Help engineering teams manage incidents
- Help engineering teams overcome any reliability obstacle (enabling team)
What you will need to be successful?
- Strong communication skills
- Solid knowledge on system and network troubleshooting
- Experience on Kubernetes
- Development skills in either Java or Python or Golang
- Fluent in English, French is a plus (other languages are appreciated).
- Be able to define SLOs and SLIs
- Experience with an in-house Observability platform (Prometheus, ElasticSearch) or SAAS (Datadog, New Relic)
Nice to have
- Experience with a service-mesh
- Hands-on experience with an automation framework