Responsibilities
- Implement a large scale data warehouse in AWS
- Implement high-performance data pipelines that can be scaled to process petabytes of data on a daily basis
- Design, implement and maintain ETL processes
- Implement Direct Acyclic Graphs (DAGs) in Apache Airflow to programmatically author, schedule and monitor workflows
- Design and build rest APIs using Python Flask framework
- Work with data scientists and productionize machine learning algorithms for real-time fraud detection
- Work with data analysts to automate and optimize reporting and BI infrastructure
Requirements
- At least 5 years of experience as a data engineer or a back end developer
- Proficient in programming in Java and Python
- Proficient in Apache Spark and Airflow
- Proficient in writing and optimizing SQL statements
- Proficient with AWS and/or Cloud Computing
- Experienced with Data Engineering services like Athena, Redshift, Sagemaker, Kineses etc.
- Experienced with SQL and NoSQL Databases like DynamoDB, RDS Aurora, MySQL, ElasticSearch, Solr, etc.
- Experienced with BI tools such as Tableau, AWS Quicksight etc
- Experienced in using monitoring tools and instrumentation to ensure optimum platform and application performance
- Experienced in both streaming and batch data processing
- Knowledge of machine learning concepts will be an advantage
- Knowledge of Scala will be an advantage
- Prior experience in working with cross-functional data and tech teams will be an advantage