Next-generation sequencing

NGS

Next-generation sequencing technology produces large amounts of data, in the order of magnitude of 100 Gb per genome. Besides the needs for data storage, sequence analysis methods are based on computational methods that require high computational efforts. Studies like genome-wide association, comparing genome sequences, demand even more storage capacity. Computational processes such as alignments for large amounts of data can typically be parallelised such that the application can be efficiently run on the Big Grid infrastructure. Smaller projects may also be suiteable for Cloud computing.

e-BioGrid is open to take more Next-Generation Sequencing support projects on board. Contact us if you are involved in NGS and you need support in software or hardware infrastructure.


Projects in this technology area

Grid- and cloud computing for high-throughput assembly and annotation of (meta)genome sequences *
description:Metagenomics analyses are based on next-generation sequence data. The assembly of reads into contigs, and functional annotation of either contigs or reads in next-generation sequencing requires significant computing resources. Creating Grid and Cloud computing pipeline solutions for next-generation sequence data analysis would be an beneficial contribution to effective metagenomics research.
applicant:Sacha van Hijum, Center for Molecular and Biomolecular Informatics
results:The e-BioGrid programmers position has recently been filled by Jumamurat Bayjan, and is now ready to start developing a pipeline for their next-generation sequencing platform. This will be done in close collaboration with the NBIC NGS taskforce and the other eBioGrid team members involved with NGS data analysis.
status:ongoing
team:Victor de Jager (CMBI), Machiel Jansen, Niek Bosch, Jumamurat Bayjan
type:This is a main project.
e-science for next generation sequence pipeline
description:In this project we will like to explore a solution to enable high throughput processing of next-gen sequence data in grid or cloud. We have two large next-gen sequence available or coming in the April or May, 2011: Dutch Genome Project (250 Dutch trios, parents plus child) and Leiden Longevity Studies (222 individuals with longevity phenotype) while the raw data will be about 60T and 100T, respectively. A simplified pipeline has been prepared for a local cluster to process the pilot data of Dutch genome project and this pipeline will be a starting point to explore a comprehensive solution to port all necessary tools to a grid environment.
applicant:Kai Ye, Leiden University Medical Centre
results:none so far; tests are ongoing
status:ongoing
team:Eline Slagboom, Kai Ye, Jan Bot, Evert Lammerts
type:This is a dedicated project.
Accelerating OpenMX genome wide association studies
description:In a genome wide association (GWA) analysis, genetic variants (Single Nucleotide Polymorphisms: SNPs) across the whole genome are tested for the association with a certain trait (such as body weight or a certain disorder). With the data that is currently available, this signifies that 1.5 to 4.5 million tests are performed. These tests can be set up using structural equation modeling in which covariance structures with fixed effects are analyzed. Due to the large amount of tests GWA analysis is computationally expensive. Because genomic data are produced at increasing density and rapidly decreasing cost the need to apply state-of-the-art high performance computing methods in GWA analyses becomes urgent. Approaches to solve this problem are to use grid technology, and to use the computer hardware more efficiently either by making use of GPUs or by optimizing the algorithms used.
applicant:Han Rauwerda, University of Amsterdam
results:20-40 times gain in computing time by algorithm using symbolic algebra
status:completed
team:Marijn van Eupen, Matthijs Kattenberg, Michel Nivard, Han Rauwerda, Dorret Boomsma
type:This is a dedicated project.
DNA Sequencing on the e-BioInfra platform
description:The e-infrastructure for bioscience research e-bioinfra is routinely used by researchers at the AMC to perform analysis of genomics data on the Dutch Grid, in particular for Next Generation Sequencing (NGS). The analysis steps are implemented as workflows that are executed on the grid in an automated fashion. Bioinformaticians at the AMC primarily run these workflows using the VBrowser, which also facilitates data manipulation on the grid storage. Selected applications are also available at the web interface of the e-bioinfra gateway for novice users. The goal of the project is to enable and enhance genomics research via advanced tools for data analysis. This is achieved in close collaboration with bioinformaticians.
applicant:Silvia Olabarriaga, on behalf of the VLEMED VO, Amsterdam Medical Centre / University of Amsterdam
results:Angela CM Luyf, Barbera DC van Schaik, Michel de Vries, Frank Baas, Antoine HC van Kampen, Silvia D Olabarriaga. Initial steps towards a production platform for DNA sequence analysis on the grid. BMC Bioinformatics 2010 Dec, 14;11:59
status:ongoing
team:Barbera van Schaik, Antoine van Kampen, Angela Luyf, Marcel Willemsen, Aldo Jongejan, Silvia Olabarriaga, Mark Santcroos, Jan Just Keijser, Shayan Shahand, Vladimir Korkhov, Souley Madougou
type:This is a dedicated project.
cloud application animal sciences
description:We want to investigate the use of Linux Vms with specific phylogenomics and population genomics software installed to do whole genome coalescent and phylogenetic analyses that are currently outside of our computational reach. The most important request here is for cpu time (e.g. several weeks of wall time with 24 processors).
applicant:Hendrik Jan Megens, Wageningen University, Animal Breeding and Genomics Centre
results:
status:ongoing
team:Jan Bot, Hendrik Jan Megens
type:This is a dedicated project.
Annotation pipeline for microbial genomes
description:The goal of the project is to have an annotation pipeline for microbial genomes. Pipeline connecting freely available software is already available as stand-alone version. Ideally it should be upgraded to a faster version (currently a step involving BLAST is limiting) and should be made accessible to other users in a web interface.
applicant:Genevieve Girard, IBL, Leiden University
results:none so far; project is starting
status:ongoing
team:Jan Bot, Genevieve Girard
type:This is a dedicated project.


* This is the main project in this technology area

Loading feed..

Subscribe to our newsletter here