The client is a large US-based enterprise that aggregates public and government financial information to provide easy access to that information for US Citizens. They aggregate terabytes of data and run very sophisticated models on that.
The project is about the aggregation and analysis of various types of data from multiple sources. And then, after the analysis and ETL is complete, reintegration of the data back to the data providers.
The responsibilities would include building various data engineering pipelines using Python and AirFlow. We do not use PySpark or Hadoop in this project.
All of the attributes of the established senior team are in place;they have extensive documentation, CD/CI support, good test coverage, a solid roadmap and a team of experienced product-oriented engineers.
Requirements:
Solid background in software engineering, understanding of the common patterns and design principles
Experience with data engineering using Python, pandas;exposure to Airflow.
Due to the compliance requirements, the candidate must be located in the US
Experience in Scala. (This is a plus)
Strong knowledge of functional programming patterns.
Ability to overlap with PST.