As a Senior Machine Learning Engineer on the team, you will have an outsized impact on our applied machine learning research and production systems. You’ll work with data scientists, data engineers, and cross-functionally across the org to build prototype models, iterate upon them, launch them, and monitor their performance in production. For this role, we’re looking for a generalist with a natural language processing background. You will primarily work on our text categorization and scoring models using tools like SpaCy, Spark NLP, Textacy, Gensim, Sci-kit Learn, and Tensorflow. In this role, you’ll work with a modern data stack and a serverless streaming data architecture. Our stack can be described as a collection of microservices using tools such as AWS Lambda, Kinesis Firehose, AWS S3, AWS Glue, Amazon Athena, API Gateway, SageMaker, Mode Analytics, and Spark [Databricks].
About You
- You have a BS or higher in Computer Science, Mathematics, Statistics, Economics or other quantitative field
- You have at least two years of experience working on applied machine learning systems in production cloud environments (AWS, Google Cloud, etc)
- You have experience along the entire machine learning product lifecycle, from initial data ingest and data prep, through to modeling and creating REST API endpoints or managing batch inference workloads, and subsequently monitoring model performance and evaluating drift.
- You’re technically competent with the Python data science ecosystem (Pandas, Numpy, SciPy, Sci-kit, Jupyter);Apache Spark, and associated frameworks (Spark NLP, Spark Streaming, Spark MLlib);and Tensorflow/Keras.
- You have production experience with messy natural language systems. You know all about tokenization, feature vectorization, word embedding, and training transformer-based language models. You’ve seen algorithms fail due to bad input data.
- You are a skilled written communicator. Our team is 100% remote and writing is our primary means of communication. You communicate complex technical topics clearly and in an approachable way.
- You enjoy collaboration and knowledge sharing. You appreciate our team’s values of humility and you are eager to collaborate with teammates with any level of statistical or engineering knowledge.
- You have experience documenting projects and processes with tools like Jira and Confluence.
- You understand that perfect is the enemy of good, and you like to stay away from over-engineered solutions. You like to find simple solutions that work, iterating upon them as needed.
Responsibilities
- Write clean, easy-to-read Python/Pyspark code, with plenty of comments for your peers and your future self.
- Develop and iterate upon our text categorization and scoring models and their associated production systems.
- Build dashboards and assist with internal analytics needs.
- Work with Data Engineering to iterate upon the pipelines that feed your models.
- Craft internal memos to keep everyone up to date on the status and performance of models and systems.
- Mentor junior engineers on staff
- Propose new projects to the Data Science leadership team
- Contribute to the project planning process by estimating tasks and deliverables