Senior Site Reliability Engineer - remote

Posted 3 years ago

Company Description

YouGov is an international research and data analytics group.

Our mission is to supply a continuous stream of accurate data and insight into what the world thinks so that companies, governments and institutions can better serve the people and communities that sustain them.

We have the best data and the best tools. We continuously challenge conventional approaches to research, and we disrupt our industry to ensure that our clients always get the best solutions.

We are driven by a set of shared values. We are fast, fearless and innovative. We work diligently to get it right. We are guided by accuracy, ethics and proven methodologies. We trust each other and bring these values into everything that we do.

Job Description

YouGov is searching for a Senior Site Reliability Engineer to help with technical planning and execution for our Site Reliability Engineering (SRE) team.

In this collaborative role, you'll work with senior Directors, and a group of Engineers to be collectively responsible for the delivery, optimization, resilience, and availability of high-value and high-transaction-rate services trusted and used by both the general public and some of the largest brands in the world. You'll collaborate on planning technical aspects, participate in selecting vendors, and help drive the adoption of best practices across all YouGov technology groups. You'll work at a fast pace with autonomy, and you will have the opportunity to train fellow SREs.

What you will do:

  • Collaborate on planning for SRE projects by helping translate high level business goals to project goals

  • Provide input on the technical design and do the technical implementation of SRE plans and projects.

  • Work on the selection of vendors to solve SRE requirements

  • Provide strong and positive mentorship to fellow SREs and to other engineers

  • Participate in support requests for YouGov’s production environment (not on-call)

  • Establish Error Budgets for the products by monitoring SLIs, measuring SLOs and publishing them to dashboards that are useful for the business.

  • Drive blameless post-mortems with all the technology teams and use the Error Budget to establish priorities for any necessary changes

  • Identify and solve critical problems and build automation to prevent their recurrence.

  • Design, develop, and implement supporting cloud services on the Kubernetes platform.

Qualifications

  • 5+ years' work experience in a similar job role.
  • Strong analytical and problem-solving skills.
  • Strong experience with log aggregation, status monitoring applications, and APMs including NewRelic, Sentry, ELK, and Prometheus
  • Kubernetes knowledge and experience (50+ nodes)
  • Experience with cloud (AWS) and on premise setups
  • Strong Linux background and understanding of networking.
  • Significant knowledge of and familiarity with SRE best practices
  • Experience working with fully remote teams
  • Experience administering and/or designing databases - SQL and NoSQL. (preferred but not required)
  • Exposure to Python web applications (preferred but not required)
  • Experience working with Agile project management methodologies

Additional information

This is a full time, permanent remote role, which can be based in a YouGov Office or remote location in the UK or Europe. We are a global team with developers in the US, South America, Europe, and India.