Senior Database Reliability Engineer - FoundationDB - Remote

Cognite
Posted 4 years ago  • remote

Want to help us bring our fundamental data stores to multiple clouds - public and private?

About Cognite:

Cognite AS is a global industrial Software-as-a-Service (SaaS) company enabling the full-scale digital transformation of heavy-asset industries. Our core software product, Cognite Data Fusion (CDF), powers companies with contextualized OT/IT data to develop and scale solutions that increase safety, sustainability, efficiency, and drive revenue.

About the Database Reliability Engineering Team Cognite’s Cognite Data Fusion contextualizes operational data at scale, enabling asset-intensive industries to make data-driven decisions. Our platform is built on many different technologies, each good at solving different problems. Some of these are absolutely fundamental, and the Database Reliability Engineering team will be responsible for the continuous well-being of our portfolio of FoundationDB, PostgreSQL, Elasticsearch and Kafka clusters, some of which we expect to have thousands of in the years to come – in both public and private clouds, through managed services and on self-managed Kubernetes clusters. Even when using mature as-a-Service offerings and Kubernetes operators, there are many things that can and will go wrong. Herding clusters that need upgrading, upscaling, cost-trimming, and recovery etc., while continuously serving heavy workloads with tight SLOs requires solid reliability engineering.

About our Tech stack:

We work with open source technologies that need to run in multiple cloud environments – both public clouds (like Google Cloud Platform and Azure) and in private clouds with customer provided Kubernetes.

Managed Kubernetes (GKE, AKS, Openshift) forms the base that we build our products on top of. To prove the market we initially built on PaaS offerings to store state, such as Google Bigtable, Spanner and Pubsub. We replicate data to different storage systems to be able to answer different types of queries. As we diversify the platforms our offering runs on, we are migrating to a self-run Foundation DB based scale-out data store for managing time series data. PostgreSQL and Elasticsearch are also important examples.

Our backend developer teams work with Java, Scala, Python, and Rust. CI/CD is handled by a combination of Github, Jenkins, and Spinnaker to test and deploy code to production. The infrastructure is managed as code with Terraform and Atlantis and services are monitored using Prometheus, Grafana and Lightstep.

As we are establishing a team in Database Reliability Engineering we are looking to hire two people to work on FoundationDB. We are looking for senior or principal engineers, who either know FoundationDB, or have experience with other high performance distributed databases and an interest and willingness to dive deep and learn.

The FoundationDB Kubernetes Operator is written in Golang, and FoundationDB itself is written in Flow, an Actor system that preprocesses C++ code.

About the job to be done:

  • Join Cognite’s DBRE team as a FoundationDB sub-team, owning the full cluster lifecycle of all of our FoundationDB clusters.
  • Work with both public clouds and on private Kubernetes deployments.
  • Establish robust reliability engineering to support these clusters, managing aspects like monitoring, chaos testing, alerting, on-call rotations, internal best-practices education, and capacity forecasting.
  • Enable product teams to focus on using the databases, and not on running them – but deeply engage them to make sure the products are operable at scale.

About you:

  • A master degree in Computer Science or a similar amount of experience.
  • Broad experience with DevOps practices such as CI/CD and Infrastructure as code
  • Experience with large Cloud deployments on any of AWS, GCP, or Azure.
  • Familiar with C++, Golang or other programming languages.
  • 2+ years of direct FoundationDB operational experience or
  • 6+ years of Linux operations experience.
  • 2+ years working with similar distributed systems
  • Familiarity and experience with our tech stack is beneficial.

What we offer you:

  • An opportunity to make an impact on the industrial future and be part of disruptive and groundbreaking projects
  • In-depth exposure to FoundationDB, a modern cloud-scale distributed datastore
  • Help to relocate to Norway
  • Competitive salary and benefits (including pension plans, insurance, benefits and more)
  • IT equipment and tools to allow you to be productive
  • Coverage of mobile telephone subscription and broadband connection
  • Extended private health services and free yearly health check
  • Free snacks and drinks throughout the day, to keep you running
  • Subsidized lunch at the canteen, with various food options
  • Free staffed gym
  • Social activities (book club, team sports activities - football, boxing, regular Cognite social events)
  • Free Norwegian courses for levels A1 - B1

Equal opportunities Cognite is committed to creating a diverse and inclusive environment and is proud to be an equal opportunity employer. Embracing diversity and inclusion means that all qualified applicants will receive the same level of consideration for employment, training, compensation, and promotion. We are following up on equal assessment in the recruitment process, and that is why we ask for gender when you apply. Answering the question is kindly requested, however, it is not mandatory and it will not affect in any way your application assessment.

Other information: Application deadline: ASAP