Senior Data Scientist (NLP) - remote

Parenthetic
Posted 3 years ago  • Arlington, VA

We are building human-driven NLP technologies and are seeking an experienced, proactive data scientist to join our newly formed engineering team. An ideal candidate will have a strong background in natural language processing methods and experience with a range of text classification problems. We have a particular interest in event extraction/detection, entity recognition, slot-filling tasks, document classification, and zero-/few-shot learning. Secondary experience in forecasting, time series, and/or active learning methods will also be beneficial. As this is a new team, we are looking for candidates who are willing to help grow the organization by taking on a range of responsibilities across the technical spectrum, as well as effectively collaborate to deliver findings to our customers.

The position may require some on-site work in Northern Virginia for team and client meetings.

Responsibilities
Your day-to-day will include:

  • Research and develop machine learning methods for a wide range of text extraction and classification tasks, often with limited labels and/or in multiple languages
  • Design and implement tools for monitoring and forecasting trends in signals derived from text data
  • Work effectively, in an often self-directed environment, to estimate timelines, communicate progress, and identify avenues for future research and development
  • Perform analyses and generate detailed data products for internal stakeholders and external clients
  • Institute MLOps principles in our software development practices and platform development
  • Deliver version-controlled, documented, and reproducible analyses and experiments that can be readily transitioned into scalable inference services

Work Experience and Skills

  • Advanced degree in computer science, math/statistics, engineering, linguistics, social science, or a related field
  • 5+ years of experience in the data science field (this is flexible depending on academic work)
  • Proficiency with major Python data science libraries, including the SciPy stack and Scikit-learn
  • Experience with at least one deep learning and/or NLP framework (Tensorflow, PyTorch, Transformers, etc.)
  • Knowledge and understanding of pre-trained language models like BERT and GPT
  • Familiarity with other commonly used technologies including Linux operating systems, SQL/NoSQL databases, etc.
  • Ability to use git, as well as other version, workflow, and project management tools and technologies
  • Possess strong communications skills, with the ability to communicate complex ideas clearly and concisely to a range of audiences
  • Aptitude for learning quickly and a willingness to take on a wide range of responsibilities

Preferred Qualifications:

  • Experience with other technologies and platforms in our stack, including: Elasticsearch, Kibana, Docker, DVC, Kubernetes, GCP, GitLab
  • Prior work in the marketing/communications and/or defense sectors
  • Ability to obtain and/or maintain a US government security clearance