Data Curation Engineer - remote

Calico
Posted 3 years ago
Stack Overflow

Who we are:

Calico is a research and development company whose mission is to harness advanced technologies to increase our understanding of the biology that controls lifespan, and to devise interventions that enable people to lead longer and healthier lives. Executing on this mission will require an unprecedented level of interdisciplinary effort and a long-term focus for which funding is already in place.

Position Description

As a Data Curator / Engineer, you will work closely with Calico scientists, external collaborators, and contract research organizations to help store and provide access to large, complex, and diverse biological datasets. You will develop schemas to accurately capture and document experimental results and methods at an appropriate technical level. You will advise and assist scientists and data scientists in best practices for biological metadata management. You must be able to learn and work independently, yet collaborate well with coworkers and share their passion to advance Calico’s quest to understanding aging and age-related disease. 

Responsibilities

  • Work with scientists to identify optimal ways to prepare, annotate, store and navigate their datasets
  • Work with software &information technology teams to specify, design, and implement the infrastructure for storing, searching, visualizing and integrating experimental datasets
  • Define and document best practices for capturing and entering experimental metadata, and educate scientists and collaborators about these standards
  • Assist labs in data and metadata submission
  • Write scripts to submit and verify data and metadata
  • Track the flow of data within ETL and analysis pipelines, ensuring successful processing and data validity

Requirements

  • 3+ years experience curating (organizing, cleaning and efficiently manipulating) scientific datasets
  • Advanced knowledge of biology (degree in life sciences or computational biology, and/or experience working in a biology lab environment)
  • Detail-oriented with strong organizational, project management and analytical skills
  • Ability to work effectively with scientists to elucidate and translate data organization needs , into written requirements and specifications
  • Ability to understand scientific literature, experimental procedures and their limitations, and current needs of the research community
  • Ability to provide specification and review as part of software development
  • Experience programming with Python, including basic data loading and analysis