Lead Data Scientist (NLP, CV) - remote

IHS Markit

Posted 4 years ago

MachineLearning Nlp ComputerVision Ocr Tensorflow

Your Role:
You will be responsible for research and development of intelligence behind document structure understanding solutions as part of IHS Markit products. Your role is needed to design and implement intelligent content understanding pipelines from raw document to a structured knowledge.

Your duties will include:

Technically lead the team of data scientists and ML engineers to develop production ready components based on recent SOTA approaches for CV and NLP
Hands-on researching, prototyping and building content understanding pipelines and working models as their components, including state of art neural architectures in combined Computer Vision and Natural Language Processing domains –data augmentation, model selection, pre-training, optimization, etc.
Converting business problems to data science tasks collaborating with Product and Project managers
Defining and/or influencing the strategy of research-intensive projects, including goal metrics, development process, toolsets and communications for optimal progress
Hunting for quality datasets, including driving the development of datasets from scratch
Learning and sharing new things around ML/DL for CV and NLP to keep the AI team on the cutting edge

About You:
You are a data scientist/deep learning engineer experienced in building the working intelligent solutions related to analysis of unstructured content, who is motivated by complex and fuzzy challenges. Your required qualifications and experience include:

Degree in data science, math &computer science, statistics or related field
4+ years of professional experience in data science and deep learning with application to Computer Vision and/or Natural Language Processing
Soft skills of a technical leader for the team (communication, engagement, patience, motivation, integrity)
Strong programming skills in Python with engineering
Strong practical experience in building of own DL models with TensorFlow, Keras and/or PyTorch, etc.
Developed skills in algorithms and data structures
Solid understanding of statistics and math behind deep learning
English language (B1+)

The following will hugely increase our interest:

Application of Computer Vision to Document Understanding (OCR, object detection, text embedding with custom pre-training or other)
Strong experience with data analysis tools
PhD degree in related field
Publications in related domain
Linux user experience

What we offer:

Open and Collaborative Environment:

Own product development based on science and technology
Personal growth and career development supported on corporate level
Support of self-study and research
Development of own deep learning architectures
Getting custom datasets from the team of professional annotators
Training on powerful private GPU cloud
Research and application of state-of-art models
Development of own unique AI-driven products that work out-of-the-box and loved by world top companies
Great colleagues and open atmosphere at workplace
Knowledge and discoveries sharing inside and outside the team
Collaboration with a great team of ML professionals
Participation in international workshops and conferences
Continuous education with invited tutors and paid online programs

Employee benefits:

English language classes
Employee stock options plans
Vacation time increase with tenure
Extended medical insurance for employees and their families
Personal accident coverage
Employee assistance program
Reimbursement of sports activities
Corporate and social events

Apply