Cargando…

Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are requ...

Descripción completa

Detalles Bibliográficos
Autores principales: Kawalia, Amit, Motameny, Susanne, Wonczak, Stephan, Thiele, Holger, Nieroda, Lech, Jabbari, Kamel, Borowski, Stefan, Sinha, Vishal, Gunia, Wilfried, Lang, Ulrich, Achter, Viktor, Nürnberg, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4420499/
https://www.ncbi.nlm.nih.gov/pubmed/25942438
http://dx.doi.org/10.1371/journal.pone.0126321
_version_ 1782369737236283392
author Kawalia, Amit
Motameny, Susanne
Wonczak, Stephan
Thiele, Holger
Nieroda, Lech
Jabbari, Kamel
Borowski, Stefan
Sinha, Vishal
Gunia, Wilfried
Lang, Ulrich
Achter, Viktor
Nürnberg, Peter
author_facet Kawalia, Amit
Motameny, Susanne
Wonczak, Stephan
Thiele, Holger
Nieroda, Lech
Jabbari, Kamel
Borowski, Stefan
Sinha, Vishal
Gunia, Wilfried
Lang, Ulrich
Achter, Viktor
Nürnberg, Peter
author_sort Kawalia, Amit
collection PubMed
description Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files.
format Online
Article
Text
id pubmed-4420499
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44204992015-05-12 Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow Kawalia, Amit Motameny, Susanne Wonczak, Stephan Thiele, Holger Nieroda, Lech Jabbari, Kamel Borowski, Stefan Sinha, Vishal Gunia, Wilfried Lang, Ulrich Achter, Viktor Nürnberg, Peter PLoS One Research Article Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. Public Library of Science 2015-05-05 /pmc/articles/PMC4420499/ /pubmed/25942438 http://dx.doi.org/10.1371/journal.pone.0126321 Text en © 2015 Kawalia et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Kawalia, Amit
Motameny, Susanne
Wonczak, Stephan
Thiele, Holger
Nieroda, Lech
Jabbari, Kamel
Borowski, Stefan
Sinha, Vishal
Gunia, Wilfried
Lang, Ulrich
Achter, Viktor
Nürnberg, Peter
Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
title Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
title_full Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
title_fullStr Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
title_full_unstemmed Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
title_short Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
title_sort leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4420499/
https://www.ncbi.nlm.nih.gov/pubmed/25942438
http://dx.doi.org/10.1371/journal.pone.0126321
work_keys_str_mv AT kawaliaamit leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT motamenysusanne leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT wonczakstephan leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT thieleholger leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT nierodalech leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT jabbarikamel leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT borowskistefan leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT sinhavishal leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT guniawilfried leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT langulrich leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT achterviktor leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow
AT nurnbergpeter leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow