Data Engineer - remote

Posted 3 years ago
Stack Overflow
Go’s Data Science team is looking for an exceptional Data Engineer to join our fully distributed team. Team Go was started with the simple premise that our real-world connections matter, that people matter. We’re building a very different kind of social app, one that helps people get together to create memorable experiences in real life. We believe in a world where social technology unites us and makes us happier. Our team is on a mission to fix social isolation and bring people together to do things they love with the people they love. As one of the first Data Engineers on the team, you will have an outsized impact on upstream and downstream processes. Your work will support all our data efforts, from democratizing self-service product analytics to enabling our machine learning technology. Our Data team is a deliverables-based organization. We believe that you will do your best work if you have the autonomy to define when and from where you work. We also believe that your performance should be judged by parameters that we agree on collaboratively. But this job will not be easy––we are looking for a candidate who likes to solve hard technical problems at scale. Key Responsibilities
  • Write clean, easy-to-read Python code, with plenty of comments for your peers and your future self.
  • Create and manage data pipelines between managed AWS data services (S3, Redshift) using AWS ETL tooling (Glue, Data Wrangler, Data Catalog).
  • Develop ETL and streaming data pipelines in Apache Spark.
  • Manage the integration of various services with Fivetran.
  • Develop and maintain standards for the administration and operation of data pipelines, including the scheduling, running, monitoring, logging, management of errors, recovery from failures, and validation of outputs.
  • Respond to both fires and planned changes to our source data. Effective planning is critical for this role.
  • Contribute to the project planning process by estimating tasks and deliverables.
Skills and Qualifications
  • You have 2-3 years of solid experience as a Data Engineer working on production systems in cloud environments.
  • You’re technically competent with Python, PySpark, and at least one SQL dialect.
  • You have a BS or higher in Computer Science, Mathematics, Statistics, Economics or another quantitative field.
  • You are a skilled written communicator. Our team is 100% remote and writing is our primary means of communication. You communicate complex technical topics clearly and in an approachable way.
  • You enjoy collaboration and knowledge sharing. You appreciate our team’s values of humility and you are eager to collaborate with teammates with any level of statistical or engineering knowledge.
  • You have experience documenting projects and processes with tools like Jira and Confluence.
  • You have experience creating and managing both relational and non-relational databases (Postgres, MongoDB, Cassandra, DynamoDB), as well as data lakes and data warehouses (AWS S3, Redshift).
  • You have DevOps experience and have an understanding of Infrastructure as Code with tools like Terraform.
  • You have an understanding of ETL/ELT processes and design. You’ve built and maintained data pipelines using tools such as AWS Glue and Fivetran.
  • You understand that perfect is the enemy of good, and you like to stay away from over-engineered solutions. You like to find simple solutions that work, iterating upon them as needed.