Senior Data Engineer - remote

Paige

Posted 3 years ago • New York, NY

Java Linux Hadoop Python Redhat ApacheSpark

We’re seeking a Senior Data Engineer who will be working the development and support of software applications, tools and data management pipelines for research and clinical purposes. Following modern product development practices, you will also assist in the design, implementation and maintenance of tools that extract and manipulate data from various sources, including in-house and external databases. This is an extraordinary opportunity to be part of a high-performing team and to pursue a life-changing mission with unique technical challenges!

This position can be fully remote for Canadian and US - based applicants. We are working remotely now, but when it is safe to return, you may also opt to work from our NYC office

Responsibilities

Work on Data Warehouse, Data Lake and BI projects and architectures at Paige.
Create and implement ETL pipelines that enables the extraction, transformation and transfer of large amounts of structured and unstructured data from various filesystems and databases, that are destined for the development of computation pathology algorithms.
Handle the challenges that come with managing terabytes of data.
Build tools to manage, automate and monitor our data and data processing infrastructure.
Design and develop software tools into existing resources. Be responsible for design, coding, testing, packaging, debugging, documentation and deployment of software systems.
Work independently to produce required functional, technical, and user documentation (e.g., business requirements, functional and technical specifications, system architecture, data flows, end-users training requirements) on assigned projects.
Work and collaborate with data engineers, scientists, engineers, IT operations and medical doctors to build tools manipulating data in order to build a new generation of artificial intelligence applications for cancer detection and treatment.

Requirements

Experience in architecting, implementing and testing data processing pipelines (e.g. Spark, Beam, ...) and data mining / data science algorithms either on-premise or on a cloud environment.
Experience in administrating and ingesting data into standard data warehouses (e.g. Amazon Redshift, Microsoft SQL Server, Google BigQuery or Snowflake).
Experience architecting data warehouses and/or data lakes for large amounts of structured and unstructured data.
Experience with data lakes and expertise with designing and maintaining a BI solutions.
Experience with workflow management tools and platforms, such as Airflow.
Extensive experience in Python programming, or related languages.
Experience with RDBMS and NoSQL databases (e.g. MongoDB).
Experience in packaging and deploying applications on-premise and in the cloud (e.g. AWS).
Familiarity with modern development practices and DevOps.
Interest in building non-standard medical software applications, in collaboration with medical partners. Cross-disciplinary and strong analytic skills.
Master’s degree in computer science or a related field, or equivalent years of experience.
6+ years of industry experience as a software/data engineer.

Apply