Senior DevOps Engineer - System Observability

Why we need you:

As a Senior Site Reliability Engineer, you will work as part of the team that manages and delivers monitoring and observability services across our production and pre-production systems.

Your responsibilities will include:

System design, configuration, integration, deployment, and operations of Observability systems and tools. These systems include collection of metrics/logs/events from gaming services, applications (client, middleware, backend) and infrastructure (AWS, on-premise). Together these Observability systems and tools serve as a critical part of PokerStars operations services
Design, deploy our Observability infrastructure and systems to the next level of availability and scale
Ensure our Observability platform exceeds goals for availability, capacity, efficiency, scalability, and performance
Develop metrics and log ingestion pipelines for high volumes of telemetry
Creating build and deployment pipelines for monitoring tools
Deployment of monitoring solutions into AWS, development and production environments
Developing a set of alerts and metrics to keep your own services alive and performing well
Collaborating with other SRE team members, working on improving efficiency and reliability of monitoring solutions
Collaborate with our Application Development teams to define the standards/APIs that ensure our Applications are emitting the right telemetry (metrics, logs, traces, events)
Collect, aggregate and visualize the collected metrics to provide visibility and standards for key indicators to understand the health of our most critical systems
Develop software to analyse real time metrics feeds and produce actionable insight. Longer term moving towards machine learning to surface anomalies automatically
Migrating Observability tools to Kubernetes
Evaluating, choosing, and implementing the next generation of Observability tools

Who are we looking for:

As a Senior SRE Observability Engineer, you have extensive working experience building/ integrating/ administering systems that leverage open-source monitoring tools at scale (e.g., InfluxDB/TICK Stack, Prometheus), Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) and Grafana. Some of your experience is focused on coding and scripting (mostly Python, Java and Bash). You have developed metrics and log ingestion pipelines for high volumes of telemetry. We are working with Atlassian products (Jira, Confluence, Bitbucket Server) so it’ll be good if you have used them too.

We try to follow the best methodologies and IT operations in an always-up, always-available service but you will be able to suggest any improvements. Our environment is Agile so it`ll be good if you have worked in such teams.

You are a quick learner who can adopt and devour a lot of information about our in-house framework and systems fast. In this position you will have to show your good soft skills and the ability to liaise with technical teams and product/business people. You can work under pressure whilst maintaining accuracy and attention to detail. As a team we are results oriented and rely on good communication to achieve success.

As the ideal candidate, you will have:

You have experience or exposure to the following technologies:

B.Sc. in Computer Science or similar
4 years+ experience with Open-Source Monitoring & Observability tooling/integration
Time Series Databases (TSDB) - InfluxDB/TICK Stack, Prometheus
Elastic Stack (Elasticsearch, Logstash, Kibana, Beats)
Grafana
Full proficiency with Linux command line environment
Strong scripting in Python and Bash
Programming experience in Java, Golang is a big plus
Expertise in Configuration and Deployment Automation using Salt and/or Ansible
Monitoring protocols/frameworks – Prometheus/Influx line format, SNMP, JMX, Spring Boot Actuator
Building software using Jenkins, JFRog, Artefactory
Git and versioning software
AWS Cloud services
Containerisation experience (Kubernetes and Docker)
Middleware (Tomcat, Kafka)
Experience with Consul, Vault, Terraform is a plus
Some familiarity with open Observability initiatives (e.g., Open Tracing, Open Census, Open Metrics)

Technical Skills

AWS

Is a Remote Job?

Hybrid (Remote with required office time)

Employment Type

Full time

PokerStars is part of Flutter Entertainment Plc, a global sports betting, gaming and entertainment provider headquartered in Dublin and part of FTSE 100 index of the London Stock Exchange, which...

Apply Now

Senior DevOps Engineer - System Observability

Don't miss out on the latest Women in Tech events, updates and news!

Powered By

Women in Tech Network

Women in Tech Conference

Tech Women Impact Globally

Follow us

Senior DevOps Engineer - System Observability

Don't miss out on the latest Women in Tech events, updates and news!

Powered By​​​​​​​

Women in Tech Network

Women in Tech Conference

Tech Women Impact Globally

Follow us

Powered By