Platform Engineer - remote

Posted 3 years ago
Stack Overflow

We're rebuilding the code running system from the ground up targeting 500 millisecond across the board so we can be the industry leaders in code running speed. This exciting project will be your primary focus for your first year at Dataquest!

This is a hybrid role, where you'll be doing a mix of backend software development work, and infrastructure/devops work. You'll be writing Python code, implementing new features of the Code Running Service, participating in code reviews, and occasionally trouble-shooting application bugs reported by our students and internal users.

You'll join the Platform Team, which is responsible for maintaining the infrastructure and tooling that supports the user-facing application, software development process, and content development process. The Platform Team also helps out with data engineering needs as necessary, and collaborates with product engineering on application performance.

Here's some examples of projects you might work on in your role as Platform Engineer at Dataquest:

  • Add a new feature to the Code Running Service, for example, to support a new answer checking feature, or support for a new language
  • Extract the Code Running Service as a microservice, separate from our backend API
  • Create a new dashboard in Datadog to monitor detailed Code Running Service performance
  • Add a new feature to the internal authoring tool (Flask &React app), for example, to support import from our legacy authoring format
  • Collaborate with product engineers to troubleshoot inconsistent failing end-to-end tests
  • Perform infrastructure maintenance, such as upgrading a Kubernetes nodepool, or upgrading our Discourse server

Stack you will use

Because this is a hybrid role where you'll be using a large variety of tools, we don't expect you already have experience with every tool in our stack! You do need to be excited about learning the ones you don't know yet.

  • Python
  • Django
  • Flask
  • Javascript / React
  • R, SQL (additional languages we run code in)
  • Docker
  • Linux
  • Bash scripting
  • AWS (Lambda, EFS, S3, Kinesis, Redshift, API Gateway, etc.)
  • Google Cloud Platform (GKE, Cloud SQL, Redis, etc.)
  • Kubernetes
  • Jenkins
  • Terraform
  • Helm
  • Cloudflare
  • Datadog

What you can expect

In your first week, expect to:

  • Get to know your new colleagues on the Engineering Team
  • Get familiar with Dataquest Engineering Team processes
  • Start getting familiar with the tools and technologies in our infrastructure &CI/CD stack
  • Implement your first small feature or fix your first bug, and deploy it to production!

In your first month, expect to:

  • Get to know the rest of the Dataquest team
  • Start getting familiar with the Code Running Service platform and code
  • Implement your first small feature or bug fix for the Code Running Service platform
  • Participate in a team planning cycle (collaborating to plan which features to work on next, determine priorities, and set goals)
  • Implement your first large feature for infrastructure or CI/CD, and deploy it

In your first three months, expect to:

  • Make improvements to the infrastructure, CI/CD tools, and Code Running Service
  • Deploy new features and bug fixes to production weekly
  • Regularly contribute to technical documentation across the infrastructure, CI/CD tools, and Code Running Service
  • Regularly bring new ideas (technical or feature proposals) for infrastructure, CI/CD tooling, or Code Running Service
  • Collaborate with colleagues on the Engineering and Content Teams on new features and tool improvements
  • Participate in the quarterly review cycle, including providing a self and manager reviews
  • Join the Level 1 on-call rotation
  • Collaborate with your manager to finalize your role description and responsibilities, and identify your career path at Dataquest

In your first year, expect to:

  • Take technical ownership over a large component of the infrastructure, CI/CD tooling, or Code Running Service
  • Pro-actively maintain technical documentation for the Code Running Service, Infrastructure, and CI/CD tools
  • Contribute to improving Engineering Team strategy, processes, communication, and culture
  • Move from Level 1 to Level 2 on-call rotation
  • Mentor colleagues on the Engineering and Content Teams regarding technical best practices in your area(s) of expertise
  • Participate in the annual review cycle, including providing self, peer, and manager reviews

What we're looking for - Minimum requirements

  • Comfortable with Linux commandline
  • Experience using Docker
  • Some experience with server and network administration OR CI/CD tooling
  • Experience with cloud infrastructure (AWS, GCP, Azure, etc.)
  • Understanding of Infrastructure as Code principles
  • Desire to learn new technologies and tools
  • Good communication skills, with audiences of varying level of technical expertise
  • At least 3 years experience as an engineer (any engineering or very technical role)

Nice to have experience

  • Experience with more of our stack (listed above)
  • Experience with Jupyter (for example, developing new kernels or features, maintaining a Jupyter Enterprise Gateway or Jupyter Kernel Gateway deployment, etc.)
  • Application, server, or network security expertise