We are building human-driven NLP technologies and are seeking an experienced, proactive data scientist to join our newly formed engineering team. An ideal candidate will have a strong background in natural language processing methods and experience with a range of text classification problems. We have a particular interest in event extraction/detection, entity recognition, slot-filling tasks, document classification, and zero-/few-shot learning. Secondary experience in forecasting, time series, and/or active learning methods will also be beneficial. As this is a new team, we are looking for candidates who are willing to help grow the organization by taking on a range of responsibilities across the technical spectrum, as well as effectively collaborate to deliver findings to our customers.
The position may require some on-site work in Northern Virginia for team and client meetings.
Responsibilities
Your day-to-day will include:
- Research and develop machine learning methods for a wide range of text extraction and classification tasks, often with limited labels and/or in multiple languages
- Design and implement tools for monitoring and forecasting trends in signals derived from text data
- Work effectively, in an often self-directed environment, to estimate timelines, communicate progress, and identify avenues for future research and development
- Perform analyses and generate detailed data products for internal stakeholders and external clients
- Institute MLOps principles in our software development practices and platform development
- Deliver version-controlled, documented, and reproducible analyses and experiments that can be readily transitioned into scalable inference services
Work Experience and Skills
- Advanced degree in computer science, math/statistics, engineering, linguistics, social science, or a related field
- 5+ years of experience in the data science field (this is flexible depending on academic work)
- Proficiency with major Python data science libraries, including the SciPy stack and Scikit-learn
- Experience with at least one deep learning and/or NLP framework (Tensorflow, PyTorch, Transformers, etc.)
- Knowledge and understanding of pre-trained language models like BERT and GPT
- Familiarity with other commonly used technologies including Linux operating systems, SQL/NoSQL databases, etc.
- Ability to use git, as well as other version, workflow, and project management tools and technologies
- Possess strong communications skills, with the ability to communicate complex ideas clearly and concisely to a range of audiences
- Aptitude for learning quickly and a willingness to take on a wide range of responsibilities
Preferred Qualifications:
- Experience with other technologies and platforms in our stack, including: Elasticsearch, Kibana, Docker, DVC, Kubernetes, GCP, GitLab
- Prior work in the marketing/communications and/or defense sectors
- Ability to obtain and/or maintain a US government security clearance