In life-science experimentation microarray
technology has become an important tool in genome and gene-expression studies.
However, nearly all array experiments are unique due to differences in
experiment design, experimental procedure, level of completeness of the data
and cellular responses. Also methods with which array studies are analyzed are
topic of much bioinformatics research. Therefore state-of-the-art array
analysis is subject to change and must be highly flexible.
Since a command line
interface offers maximum flexibility, many bioinformaticians favor such an
environment. On the other hand, the life sciences is an interdisciplinary
science in which it is important to share methods and results. This is a
dilemma because a command line interface in itself does not have facilities to
share and manage experiment pipelines. Another aspect to array analysis is that
at some points much computing power is needed. Since computational requirements
are far from constant over time some kind of sharing of computational resources
e-BioGrid is open to take microarray dedicated projects on board. Contact us if you are involved in micro-array technology and you need support in software or hardware infrastructure.
Nearly all micro-array experiments are unique due to differences in experiment design, experimental procedure, level of completeness of the data and cellular responses. Also methods with which array studies are analyzed are topic of much bioinformatics research. Therefore state-of-the-art array analysis is subject to change and must be highly flexible. Another aspect to array analysis is that at some points much computing power is needed. Here we propose to set up an architecture that implements Problem Solving Environments for six problem areas from array design to downstream expression analysis. We will do this by setting up web services and by the configuration of dedicated Virtual Laboratory Machine Images that can be instantiated in a High Performance Compute Cloud. These Machine Images can be shared and dynamically scaled to a large virtual computer cluster. The content of these Machine Images will be documented and stored in a public Document Management Server. The resources used, input data and experimental results can be stored in a private Result Representation Server.
Timo Breit, University of Amsterdam
Software. - a web-based microarray data quality control system (MADQC) has been developed. This environment produces, based on a data matrix, a design file and a contrast matrix a number of calculations that enable the biologist to quickly assess the quality of the microarray experiment. Until now ~150 experiments have been processed covering several thousands of microarrays; - image splitter software that makes 20-bits microarray scanner images compatible for 16-bits extraction software. This software has been used until today with 1000+ slide scans each containing 4-12 arrays. - starting and stopping Cloud clusters on the fly: the Cloud Manager. Central element in this environment is a dedicated machine that runs in the cloud and acts as a manager node that listens to requests from the outside world and can access the XML-RPC interface that is only accessible from within the cloud. Via this interface a cluster can be started and stopped. CloudManager is used on a daily basis and not only with NGS Designer but also when a bioinformatician needs a cluster from within a local R-session. Currently we have used approx.of our 112.500 core hours on the HPC cloud. - a web tool to design tiling microarrays for a set of (prokaryote) sequences based on a tile step or simply on the required number of probes (Progenius*). It provides functionality to annotate the probes on the basis of the input sequences provided by the user. So far we have used Progenius to design probes for several studies Including those for UMC Utrecht on Staphylococcus aureus strains and RIVM on Salmonella. - A web tool (NGS designer*) that produces microarray designs on the basis of Next Generation Sequencing reads. Parameters that can be set are: required numbers of probes, probe length, sequence similarity thresholds, thermodynamic parameters such as Free Gibbs Energy and GC content. NGS Designer has been used to design arrays for several species, such as fruits and Chironimus riparius (PhD thesis Marino Marincovic on Gene expression in toxicant-exposed chironomids). The size of the input files requires to instantiate a machine in the BigGrid HPC Cloud for each NGSDesigner request. For this the Cloud Manager was used. - implemented generic tools for transcriptomics data analysis, like ANOVA based analyses on the cloud; - started the development of a normalization tool based on extensive experimental spike-ins; Currently we are evaluating the use of these spike ins in Next Generation Sequencing based transcriptomics experimentation (Ion Proton platform). - Developed a tool for present/absent calling in microarray-based CGH*; - The result browser (MARSDB*) allows exploring the results of multi-factorial transcriptomics experiments. MARSDB originates from the many and repetitive requests from biologists. MARSDB allows us to generate a list of significant probes at a certain p-value, to generate a list of probes in each contrast with all statistics, to make selections of probes or of probes in a GO category in each given contrast and to determine overlaps (Venn diagrams) in a maximum of 4 contrasts. These lists are then exported and can be further used in other applications such as applications for set analysis and pathway analysis. Finally the tool can produce heatmaps. This tool has been used in several studies. - implemented pipelines for interpreting dose & time range-finding experiments in the context of design for experimentation. These pipelines have been of particular interest in the BioRange TASTOE project; - We have explored ways to store, query and browse genome data using the community effort of GBrowse and GMOD. We have setup repositories for several organisms and added tracks from our array design pipeline. - The development and support of a PSE for community-based collaboration. Currently we are preparing a prototype of this environment in which we will use the data generated in the context of the BioRange TASTOE project on early zebrafish development. The community-based research questions will specifically be of a biological nature. In this environment we will use many of the tools that have been developed in WP1, such as the result browser and the Gbrowse/GMOD environments. Next to that we introduce: a GIST based Tool and code repository. This repository contains entries to all our tools, programs, and scripts and is well annotated, searchable and can be tagged. The Github/GIST environment offers a means to share code snippets. But we also wanted to be able to share links to web resources and to access packages. Hence we made a mixed environment that can be put in the website of the community-based collaboration project, in which anybody can upload, annotate and tag new scripts, packages and weblinks. For this we use the web API of GIST. * Publication in preparation
Usage and access. The PROGENIUS Array Design tool is used by UvA, RIVM, WUR and UMCU. The NGS Designer tool is also used by the UvA-IBED group. Several e-science ideas and concepts are exploited within the Virtual Lab for Plant Breeding consortium. Progenius can be run to design multistrain probes for non-standard tiling microarrays.NGS designer is a tool to design probes based on Next-generation sequencing reads for non-standard microarrays.
Publications. 1. Chikhovskaya JV, Jonker MJ, Meissner A, Breit TM, Repping S, van Pelt AMM Human testis-derived embryonic stem cell-like cells are not pluripotent, but possess potential of mesenchymal progenitors. Human Reproduction 2012 Jan;27(1):210-21. 2. Schaap MM, Zwart EP, Wackers P, Huijskens I, van de Water B, Breit TM, van Steeg H, Jonker MJ, Luijten M Dissecting Modes of Action of Non-Genotoxic 1 Carcinogens in Primary Mouse Hepatocytes. Archives of Toxicology 2012 Nov;86(11):1717-27. 3. Marinkovic M, de Leeuw WC, de Jong M, Kraak MH, Admiraal W, Breit TM, Jonker MJ. Combining next-generation sequencing and microarray technology into a transcriptomics approach for the non-model organism Chironomus riparius. PLoS One. 2012;7(10):e48096. doi: 10.1371/journal.pone.0048096. Epub 2012 Oct 25. 4. Marinkovic M, de Leeuw WC, Ensink WA, de Jong M, Breit TM, Admiraal W, Kraak MH, Jonker MJ. Gene expression patterns and life cycle responses of toxicant-exposed chironomids. Environ Sci Technol. 2012 Nov 20;46(22):12679-86. doi: 10.1021/es3033617. Epub 2012 Nov 9. 5. Marinkovic M, de Bruijn K, Asselman M, Bogaert M, Jonker MJ, Kraak MH, Admiraal W. Response of the nonbiting midge Chironomus riparius to multigeneration toxicant exposure. Environ Sci Technol. 2012 Nov 6;46(21):12105-11. doi: 10.1021/es300421r. Epub 2012 Oct 19. 6. Schaap MM, Zwart EP, Wackers PF, Huijskens I, van de Water B, Breit TM, van Steeg H, Jonker MJ, Luijten M. Dissecting modes of action of non-genotoxic carcinogens in primary mouse hepatocytes. Arch Toxicol. 2012 Nov;86(11):1717-27. doi: 10.1007/s00204-012-0883-6. Epub 2012 Jun 19. 7. Doroszuk A, Jonker MJ, Pul N, Breit TM, Zwaan BJ. Transcriptome analysis of a long-lived natural Drosophila variant: a prominent role of stress- and reproduction-genes in lifespan extension. BMC Genomics. 2012 May 4;13:167. doi: 10.1186/1471-2164-13-167. 8. Roeschmann K, Luiten S, Jonker M, Breit TM, Fokkens W, Petersen A, van Drunen Timothy Grass pollen extract-induced gene expression and signaling pathways in airway epithelial cells. Clin Exp Allergy. 2011 Jun;41(6):830-41 9. Yuan X, Jonker MJ, de Wilde J, Verhoef A, Wittink FRA, van Benthem J, Bessems JG, Hakkert BC, Kuiper R, van Steeg H, Breit TM, Luijten M Finding maximal transcriptome differences between reprotoxic and non-reprotoxic phthalate responses in rat testis. Journal of Applied Toxicology 2011 31 (5); 421-430 10. de Bekker C, Bruning O, Jonker MJ, Breit TM, Woesten HA. Single cell transcriptomics of neighboring hyphae of Aspergillus niger. Genome Biol. 2011 Aug 4;12(8):R71. doi: 10.1186/gb-2011-12-8-r71. 11. Hakvoort TB, Moerland PD, Frijters R, Sokolovi? A, Labruyere WT, Vermeulen JL, Ver Loren van Themaat E, Breit TM, Wittink FR, van Kampen AH, Verhoeven AJ, Lamers WH, Sokolovi? M. Interorgan coordination of the murine adaptive response to fasting. J Biol Chem. 2011 May 6;286(18):16332-43. doi: 10.1074/jbc.M110.216986. Epub 2011 Mar 10. 12. Yuan X, Jonker MJ, de Wilde J, Verhoef A, Wittink FR, van Benthem J, Bessems JG, Hakkert BC, Kuiper RV, van Steeg H, Breit TM, Luijten M. Finding maximal transcriptome differences between reprotoxic and non-reprotoxic phthalate responses in rat testis. J Appl Toxicol. 2011 Jul;31(5):421-30. doi: 10.1002/jat.1601. Epub 2010 Nov 9.
Linda Bakker, Wim de Leeuw, Han Rauwerda, Mark de Jong, Oscar Bruning, Timo Breit
MAD/IBU is a transcriptomics expert centre. MAD/IBU participates in a number of scientific projects such as the Concord MRSA FP7 EU project, NBIC BioAssist 8.1, NBIC BioRange 4.1, BiGGrid?s e-BioGrid, TTI-GG Virtual Lab for Plant Breeding. MAD/IBU is also is the Microarray facility of the University of Amsterdam, offers bioinformatics support in transcriptomics projects and provides bachelor as well as master education in the area of transcriptomics. In all mentioned projects and activities MAD/IBU has a need for high performance computing. In general initial analyses are performed on machines owned by MAD/IBU. However, these machines are by far not sufficient to analyse full (transcriptomics) datasets with state-of-the-art methodology. On the other hand, MAD/IBU does not have a need for the continuous usage of HPC equipment. The software MAD/IBU uses is very diverse. For these reasons the HPC cloud very much suits our needs and therefore we apply for compute time and storage on the HPC Cloud. The main applications for which we need HPC are mixed effect model analysis of microarray data, (multi-strain) micro array design, sequence alignment using non-redundant Genbank freezes, transcription factor binding site discovery, Next Generation de novo assembly, SNP-calling, re-sequencing (read-mapping) and RNA-seq tooling.
Timo Breit, University of Amsterdam
Timo Breit, Linda Bakker, Oskar Bruning, Martijs Jonker, Mattias Kuzak, Wim de Leeuw, Han Rauwerda