Data Engineer (Full-Time, Remote)

42 Technologies

Posted 3 years ago

42 is an end-to-end analytics stack for retailers and brands!

Whether it's store managers running A/B tests on new product placement, CEOs looking at their global store performance, or merchandisers planning the next season, our platform enables retailers and brands to get instant visibility in their business.

The majority of retailers don't have the in-house expertise to spin up data infrastructure. Instead, they piece together reports in excel on a daily basis. This works fine for small eCommerce brands, but not for retailers in the $50M-$5B range. That's where we come in.

We offer our customers:

Retail-specific dashboards: best-practice metrics / visuals are built-in and customizable
No integration: we make their data and systems work with our platform
Hosted infrastructure: we host the pipelines and data warehouse

So, why join?

We are growing and profitable
We have high user engagement
We run the analytics for brands you know
YCombinator-backed
Small team, lots of opportunities to grow and shape the future of the company
We are a no ego, no BS, collaborative group passionate about our product and customers

Here's our stack

App: Isomorphic JS / TS, Lerna, SQL
Data: PySpark, OLAP SQL
Infrastructure: AWS, GCP, Docker, K8s, Google Dataproc, In-memory columnar database

What you'll be responsible for

Identifying relevant data in a retailer’s system landscape
Working with internal and external stakeholders to understand data and reporting requirements
Automating the cleaning and merging of datasets across similar data sources
Improving our internal data model to increase query performance and accommodate new functionality
Generalizing new features so that they work with all of our customers
Troubleshooting technical issues with performance, data discrepancies, alerts

Sample projects working at 42

Extract, clean, and merge sales data from Walmart, Target, and Amazon
Identify and deduplicate customer records
Upgrade the pipeline from a batch to a streaming system
Validate data from our data warehouse with unprocessed raw data
Analyze and tune the performance of Spark jobs
Evaluate and implement new tools to enhance our data pipeline

You are a team member that...

can overlap ~4hrs with San Francisco Timezone (Pacific Time)
can work well as part of a fast-paced remote-first startup
has a Bachelo's degree, with a major in an analytical or technical field strongly preferred
has 1-3 years professional experience in data engineering, data science, or analytical products
has work experience with Python (or similar languages);prior experience working with Apache Spark is preferred
has strong technical intuition and ability to understand complex business systems
has strong technical accomplishments in SQL, ETLs, and data analysis skills
has knowledge in data modeling concepts and implementation
is experienced with git, cli, and general software development
is familiar with cloud platforms like AWS or GCP

What we provide

Competitive salary and equity package
Company games every 2 weeks, IRL meetup every 4-6 months
Flexible family benefits
Flexible vacation policy
Special requests welcome!

To wrap up, a few fun facts about us

We are a lean team, located across three continents and speak six languages 🌏🌍🌎
More than half of us are parents 👶🐶🐱
We like sharing food updates 🍱🌮🍕though we are split on loving or detesting peppers 🫑

If this sounds interesting to you, we would love to hear from you!

Apply via email [careers @ 42technologies.com] or the "Apply for this position" button – and put DON'T PANIC in the subject line to prove you are a human 🤖

Apply