Lead Data Scientist (NLP, CV) - remote

IHS Markit
Posted 4 years ago

Your Role:
You will be responsible for research and development of intelligence behind document structure understanding solutions as part of IHS Markit products. Your role is needed to design and implement intelligent content understanding pipelines from raw document to a structured knowledge.

Your duties will include:

  • Technically lead the team of data scientists and ML engineers to develop production ready components based on recent SOTA approaches for CV and NLP
  • Hands-on researching, prototyping and building content understanding pipelines and working models as their components, including state of art neural architectures in combined Computer Vision and Natural Language Processing domains –data augmentation, model selection, pre-training, optimization, etc.
  • Converting business problems to data science tasks collaborating with Product and Project managers
  • Defining and/or influencing the strategy of research-intensive projects, including goal metrics, development process, toolsets and communications for optimal progress
  • Hunting for quality datasets, including driving the development of datasets from scratch
  • Learning and sharing new things around ML/DL for CV and NLP to keep the AI team on the cutting edge

About You:
You are a data scientist/deep learning engineer experienced in building the working intelligent solutions related to analysis of unstructured content, who is motivated by complex and fuzzy challenges. Your required qualifications and experience include:

  • Degree in data science, math &computer science, statistics or related field
  • 4+ years of professional experience in data science and deep learning with application to Computer Vision and/or Natural Language Processing
  • Soft skills of a technical leader for the team (communication, engagement, patience, motivation, integrity)
  • Strong programming skills in Python with engineering
  • Strong practical experience in building of own DL models with TensorFlow, Keras and/or PyTorch, etc.
  • Developed skills in algorithms and data structures
  • Solid understanding of statistics and math behind deep learning
  • English language (B1+)

The following will hugely increase our interest:

  • Application of Computer Vision to Document Understanding (OCR, object detection, text embedding with custom pre-training or other)
  • Strong experience with data analysis tools
  • PhD degree in related field
  • Publications in related domain
  • Linux user experience

What we offer:

Open and Collaborative Environment:

  • Own product development based on science and technology
  • Personal growth and career development supported on corporate level
  • Support of self-study and research
  • Development of own deep learning architectures
  • Getting custom datasets from the team of professional annotators
  • Training on powerful private GPU cloud
  • Research and application of state-of-art models
  • Development of own unique AI-driven products that work out-of-the-box and loved by world top companies
  • Great colleagues and open atmosphere at workplace
  • Knowledge and discoveries sharing inside and outside the team
  • Collaboration with a great team of ML professionals
  • Participation in international workshops and conferences
  • Continuous education with invited tutors and paid online programs

Employee benefits:

  • English language classes
  • Employee stock options plans
  • Vacation time increase with tenure
  • Extended medical insurance for employees and their families
  • Personal accident coverage
  • Employee assistance program
  • Reimbursement of sports activities
  • Corporate and social events