Scalable Path's is looking for a Back-end Python Data Engineer to work on a client project. This is a remote, part-time position (approximately 4 hours/day). The client is also open to candidates who are available full-time.
CLIENT COMPANY DESCRIPTION:
The client is an analytics and marketing company located in New Jersey. Their product team builds business intelligence, analytics, and admin tools for primarily pharma/healthcare customers.
DUTIES AND RESPONSIBILITIES:
The client has analytics software that it licenses to many end customers (mostly in the pharmaceutical industry). One of their clients (a top 10 pharma company in the US) is changing their tech stack, which means that there are some ETL (Extract, Transform, and Load) scripts that need to be modernized and rewritten in Python. The client would also like to make these scripts more efficient and add some new features. The purpose of these scripts is to read the client's data, merge it together, create reports, export the data, and load it into a front-end application.
There are five main areas of the application that need to be migrated, and it will be your responsibility to re-write two of the five.
The existing ETL code is written in the SAS language (https://en.wikipedia.org/wiki/SAS_language). There are other members of the team who can explain the existing code, so knowing the SAS language is not required.
The source data are files in S3 in Parquet format (https://parquet.apache.org/) and the destination data are also written to S3.
Required Skills and Experience:
- Strong Python Coding Skills
- Experience working with large data sets, doing complex queries, merging data, and generating reports
- Experience branching and merging with Git (the client data team is new to Git and could use some guidance)
Desired Experience:
- Experience using DataFrames in Pandas (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)
Other Technologies Used (knowledge would be nice-to-have):
- Jupyter Notebooks (https://jupyter.org/)
- Amazon Athena (https://aws.amazon.com/athena/)
- Amazon SageMaker (https://aws.amazon.com/sagemaker/)
- Hadoop
RELATIONSHIPS - WHO YOU'LL BE WORKING WITH:
You will be reporting to VP and Analytics Lead and will work with a remote team of data scientists, Python developers, and QA engineers. The client is in the America/New York (-04:00) EDT time zone.
START DATE: As soon as possible.
EXPECTED CONTRACT DURATION: 1-3 months
There is a possibility for the engagement to last longer if things go well. They may want help with maintenance after the initial rewrite is complete.
Required skills