The Post relies quite heavily on data-informed decision making both at a strategic and operational level. Over the years, The Post has experienced a significant increase in traffic and subscriptions across various platforms and channels. The increased data volume and velocity coupled with new sources and complexity has created new challenges.To address these challenges and gain maximum benefits from our data, we are creating an Integrated Customer Data by stitching data signals from various data sources. This integrated customer data will be used to power various marketing and personalization efforts via enhanced workflows, automations and data activations via homegrown and vendor platforms.
We are calling this initiative WaPo 360. As ourSenior DataEngineer you’ll be part of a cross functional team and work in close collaboration with analytics, engineering and other stakeholders. You will be building and maintaining various systems of WaPo 360. You will be responsible for day-to-day operations of systems that depend on data, ensuring data is properly processed and securely transferred in a timely manner. Processing data will include managing, manipulating, storing, and parsing data in a data pipeline for variety of target sources. You will also support maintenance of applications and tools that reside on these systems such as upgrades, patches, configuration changes, etc.
Responsibilities
- You will work closely with stakeholders across departments to architect, build and deploy various initiatives within WaPo 360
- Participate in all stages of software development - from early brainstorming to coding and bug fixing
- You will design, develop, deploy and maintain data services and/or pipelines to AWS
- You will develop best practices and approaches to support continuous process automation for data ingestion and data pipeline workflows
- You will perform multiple tasks simultaneously under changing requirements and deadlines
- You will prepare and present reports, analysis and presentations to various stakeholders including executives
Requirements
Minimum Qualifications
- Bachelor's degree in computer science, engineering, mathematics, or a related technical discipline
- 5+ years of experience as a Data Engineer or in a similar role
- 5+ Years of experience with data modeling, data warehousing, and building ETL pipelines
- Expertise in SQL and query optimizations.
- Strong Hive and Pyspark skills. Experience building business-critical, data processing pipelines on AWS
- Industry experience in software development, data engineering, business intelligence, data science, or related field with a track record of manipulating, processing, and extracting value from large datasets
Preferred Qualifications
- Experience using streaming and batch processing technologies (Hadoop, Hive, Hbase, Spark, and Kafka)
- Experience working with AWS big data technologies (EMR, Redshift, S3, AWS Glue, Kinesis and Lambda for Serverless ETL)
- Knowledge of data management fundamentals and data storage principles
- Knowledge of distributed systems as it relates to data storage and computing
- Hands-on experience with multiple database technologies including Postgres, MySQL, MongoDB, and DynamoDB
- Basic scripting skills using Python and Scala
- Basic understanding of Machine Learning