Genestack’s mission is to accelerate health discoveries by harnessing the explosive growth of biomedical data. As the core currency of the future, organisations need to manage data effectively in order to remain competitive. Founded in 2012 in Cambridge, UK, our product, the Omics Data Manager (ODM), is now transforming how scientists collaboratively work with multi-omics data, for drug development, crop improvement, and others.
One of the most urgent needs in data management is to source, organise and prepare “clean” data in a data management system like ODM, so the data are ready for consumption by bioinformatics data scientists to perform analyses on drug target validity or biomarker discovery, as well as by artificial intelligence/machine-learning (AI/ML) algorithms as good quality training sets. The bio and agri-industries are starting to be aware of the prerequisite of having good-quality data, to avoid spurious results in downstream scientific discoveries.
If you are a data manager with a biology background, and are eager to make this prerequisite a reality as soon as possible for data scientists at Genestack and our ODM customers, we want to hear from you.
- Source, import and maintain high-quality data in ODM by performing data extract-transform-load (ETL) functions:
- retrieving data regularly from large public repositories or ad hoc from individual high-profile projects,
- creating, running, adapting and maintaining data transformation Python scripts,
- loading the transformed data into ODM by running existing data loading code.
- Collaborate closely with Genestack data scientists by preparing “clean” data, allowing your colleagues to focus on tasks such as performing statistical analyses on the data.
- Maintain and update existing ontologies/vocabularies/dictionaries and auxiliary data (e.g. reference genome sequences) to ensure interoperability of the data.
- Develop and grow business relationships with our customers’ data managers and/or engineers by providing expert advice on how to prepare “clean” data, so our customers can get the maximum value out of ODM.
Required technical competencies
- You are proficient in ingesting, persisting and exporting large volumes of biological omics data and metadata from disparate sources using common file transfer protocols (e.g. HTTP, FTP) and RESTful APIs.
- You have domain knowledge about omics data captured in common data formats (e.g. GFF, GCT, VCF, FASTA, FASTQ, BAM)
- You have used or constructed controlled vocabularies/ontologies, and are familiar with the related file formats (e.g. OWL, OBO, SKOS).
- You are proficient in writing bespoke scripts in Python for wrangling and transforming data from a source into required formats.
- You pay a lot of attention to details to ensure subtle differences or discrepancies in data are not overlooked.
- You understand how auxiliary/reference data (e.g. reference genome sequences) are used in the analysis and interpretation of omics data.
- You understand what a "data model" is and, using your omics domain knowledge, can make recommendations to Genestack software engineers/architects on its design.
Desirable technical background
- You are familiar with biological and/or clinical data standards and formats
- You have worked in a biology R&D laboratory in academia or industry setting.
- You have a basic understanding of how data search platforms work, e.g. Solr.
Desirable personal qualities
- You are a passionate ambassador for good-quality, well-managed data that markedly improves the precision of R&D discoveries by our customers.
- You take pride in working persistently with a high volume of messy data, a task often resented by some scientists as being mundane, while under time pressures for business needs.
- You are a competent communicator, both verbally and in writing, who enjoys working in cross-functional teams both in Genestack and with our customers.
- You are proactive in flagging and mitigating any risks which can spoil our customers’ ODM experience.
Your career growth at Genestack
As a fast-growing company in the exciting intersection of big data, technology, and genomics, you will be:
- Enabling data-driven transformation for leading life sciences organisations.
- Supported by professional training and mentoring schemes.
- Trusted with plenty of pivotal opportunities and responsibilities.
- Building a diverse and far-reaching professional network.
- Well-compensated with company-performance bonuses, health services, and above-rate pension contribution.
If you feel that this excites you and you meet these criteria, we would love to hear from you!
Location: Cambridge, UK.
Level of experience: Both junior and mid-level data managers are considered.
Minimum requirement: A degree in bioinformatics or equivalent professional experience.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, colour, national origin, gender, sexual orientation, age, marital status or disability. Genestack is expanding internationally, creating opportunities for individual career growth. Now is a great time to get involved. Learn more about our culture, values and the team at http://www.genestack.com/careers.