Role Overview:
We need a Data Scientist to work alongside our CTO and Data Scientist to help us deliver on our busy product roadmap. We have a number of core features and services that need to be built that is too much for two people.
Reporting into the CTO, you’ll work on a range of interesting problems. Collaborating with our external partners and internal stakeholders to ensure you understand the problems we need to solve and translating them into modelling requirements.
What will you be working on for the next 12 months?
- Work closely with the CTO and data science team, research, design and develop algorithms to extract information from unstructured policy documents
- Use your knowledge and experience of the latest NLP techniques in deep learning and transformer architectures, and refine and apply them to our information extraction needs
- Build models to extract information, using supervised techniques ranging from classification, entity recognition and entity relation extraction.
- Research and develop approaches to parse pdfs and other document formats to extract the document structure, text and data into machine readable formats.
- Develop approaches for efficient data labelling and to bootstrap the creation of training and test datasets
- Devise and test methods that can adapt to varying information extraction needs, being flexible in the definition of policy concepts
- Communicate research progress and findings in research blogs, articles and papers
- Be a strong advocate of open source, playing a key role in providing open access datasets and models to the climate policy and machine learning community.
Key skills/exp needed:
- PhD in a relevant discipline (ML and NLP) or a Master’s plus two years+ experience
- You have worked in an environment (either academic or industry) where you carried out machine learning research on applied problems
- Deep understanding of machine learning, including both supervised and unsupervised
- Practical experience of using deep learning applied to NLP;proficient in deep learning frameworks such as PyTorch (ideally) or TensorFlow.
- Solid understanding of deep learning architectures including transformers, attention and continuous representations;transfer learning and pretrained model fine tuning;you understand what these models are doing as well as how to use them and obtain optimal results.
- Excellent Python programming skills and knowledge of how to write well structured, maintainable research and production quality code;version control using Git;unit testing.
- Experience of using standard numerical packages including pandas, scikit-learn and numpy and tools for data science including Jupyter
- Understanding of the research-product lifecycle, how to deploy models in production and what needs to be considered to deploy an effective model.
Nice to haves:
- Experience of semi-supervised techniques such as active learning, weak supervision and zero/few shot learning would be beneficial
Career progression:
- Progression path: Senior >Lead Data Scientist, leading a small team