Data Engineer (Full-Time, Remote)

42 Technologies
Posted 3 years ago
We Work Remotely
42 is an end-to-end analytics stack for retailers and brands!

Whether it's store managers running A/B tests on new product placement, CEOs looking at their global store performance, or merchandisers planning the next season, our platform enables retailers and brands to get instant visibility in their business.

The majority of retailers don't have the in-house expertise to spin up data infrastructure. Instead, they piece together reports in excel on a daily basis. This works fine for small eCommerce brands, but not for retailers in the $50M-$5B range. That's where we come in.

We offer our customers:

  • Retail-specific dashboards: best-practice metrics / visuals are built-in and customizable
  • No integration: we make their data and systems work with our platform
  • Hosted infrastructure: we host the pipelines and data warehouse

So, why join?
  • We are growing and profitable
  • We have high user engagement
  • We run the analytics for brands you know
  • YCombinator-backed
  • Small team, lots of opportunities to grow and shape the future of the company
  • We are a no ego, no BS, collaborative group passionate about our product and customers

Here's our stack
  • App: Isomorphic JS / TS, Lerna, SQL
  • Data: PySpark, OLAP SQL
  • Infrastructure: AWS, GCP, Docker, K8s, Google Dataproc, In-memory columnar database

What you'll be responsible for
  • Identifying relevant data in a retailerโ€™s system landscape
  • Working with internal and external stakeholders to understand data and reporting requirements
  • Automating the cleaning and merging of datasets across similar data sources
  • Improving our internal data model to increase query performance and accommodate new functionality
  • Generalizing new features so that they work with all of our customers
  • Troubleshooting technical issues with performance, data discrepancies, alerts

Sample projects working at 42
  • Extract, clean, and merge sales data from Walmart, Target, and Amazon
  • Identify and deduplicate customer records
  • Upgrade the pipeline from a batch to a streaming system
  • Validate data from our data warehouse with unprocessed raw data
  • Analyze and tune the performance of Spark jobs
  • Evaluate and implement new tools to enhance our data pipeline

You are a team member that...
  • can overlap ~4hrs with San Francisco Timezone (Pacific Time)
  • can work well as part of a fast-paced remote-first startup
  • has a Bachelo's degree, with a major in an analytical or technical field strongly preferred
  • has 1-3 years professional experience in data engineering, data science, or analytical products
  • has work experience with Python (or similar languages);prior experience working with Apache Spark is preferred
  • has strong technical intuition and ability to understand complex business systems
  • has strong technical accomplishments in SQL, ETLs, and data analysis skills
  • has knowledge in data modeling concepts and implementation
  • is experienced with git, cli, and general software development
  • is familiar with cloud platforms like AWS or GCP

What we provide
  • Competitive salary and equity package
  • Company games every 2 weeks, IRL meetup every 4-6 months
  • Flexible family benefits
  • Flexible vacation policy
  • Special requests welcome!

To wrap up, a few fun facts about us
  • We are a lean team, located across three continents and speak six languages ๐ŸŒ๐ŸŒ๐ŸŒŽ
  • More than half of us are parents ๐Ÿ‘ถ๐Ÿถ๐Ÿฑ
  • We like sharing food updates ๐Ÿฑ๐ŸŒฎ๐Ÿ•though we are split on loving or detesting peppers ๐Ÿซ‘

If this sounds interesting to you, we would love to hear from you!
ย 
Apply via email [careers @ 42technologies.com] or the "Apply for this position" button โ€“ and put DON'T PANIC in the subject line to prove you are a human ๐Ÿค–