About the team/job
The EMBL (European Molecular Biology Laboratory) is looking for a research assistant to join the Genomics Core Facility (GeneCore) team at its main Laboratory in Heidelberg to participate in projects carried out in collaboration with Cellzome/GSK located on campus. The selected candidate will work in the interdisciplinary environment connecting applications in the fields of genomics and proteomics related to drug discovery. We are looking for Bioinformatician/Computational biologist to join the team.
Your role
The successful candidate will:
- monitor the data production and data processing pipelines;
- provide the analysis, management and integration of massively parallel sequencing data generated from a range of sequencing technologies available in GeneCore (Illumina, Pacific Biosciences, 10x Genomics, Oxford Nanopore, (spatial) single cell sequencing) and library preparation protocols (DNA-Seq, RNA-Seq, ChIP-seq, scRNA-Seq, and long-read sequencing protocols);
- participate in the maintenance and evolution of the GeneCore laboratory information management system (LIMS).
The main task will be the design, implementation and maintenance of complete computational workflows and pipelines required for efficient and timely delivery of large-scale sequencing data including their comprehensive informatics processing and biological analysis. Developing data analysis strategies for multi-omics analyses using state-of-the-art concepts from computational biology, algorithmic bioinformatics and biostatistics is expected, and prior experience in one of these areas is thus of particular interest. Familiarity with high-performance computing environments, job scheduling, load balancing and parallel computing is considered as a valuable asset. Knowledge of data management concepts (FAIR principles), data integration, (meta-) data handling and the design of REST APIs is considered a plus.
The candidate will be responsible for:
- developing computational workflows to monitor the production and perform analyses of short-read and long-read sequencing data sets;
- implementing core pipelines for basecalling, de-multiplexing, data quality control, sequence alignment, variant calling, quantifying gene expression and single-cell analyses;
- designing workflows that are maintained and disseminated to the research community using widely used code repositories (GitHub / GitLab), container technologies (docker, singularity) and workflow engines (Nextflow, Snakemake);
- analyzing massively-parallel sequencing data sets to support other EMBL researchers and scientists from EMBL member states;
- maintaining integrity and optimalization of the GeneCore LIMS;
- teaching and co-organizing scientific courses to educate junior researches at EMBL and elsewhere in crucial applications of massively parallel sequencing data relevant to omics projects.
You have
Ideally, a PhD in computational biology or a related field with a strong focus on bioinformatics and sequencing data analysis
- advanced programming skills, ideally including R, Python, Unix/Bash
- strong interest and experience in biological data analysis and scientific software development
- experience in setting up bioinformatics services and computational workflows for large-scale multi-omics data sets using software pipelines in an HPC or cloud environment
- experience in (meta-) data management, data transfer technologies, database technologies, and data integration
- the desire to support and assist biological researchers in their bioinformatics analysis
- the ability to integrate tools into pipelines and workflows and optimize their interoperability, efficiency, usability and portability
- the desire to teach into workshops and courses on data analysis of -omics data
The successful candidate will be part of an international interdisciplinary core facility team and needs to be well organized, op