Cargando…
Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are requ...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4420499/ https://www.ncbi.nlm.nih.gov/pubmed/25942438 http://dx.doi.org/10.1371/journal.pone.0126321 |
_version_ | 1782369737236283392 |
---|---|
author | Kawalia, Amit Motameny, Susanne Wonczak, Stephan Thiele, Holger Nieroda, Lech Jabbari, Kamel Borowski, Stefan Sinha, Vishal Gunia, Wilfried Lang, Ulrich Achter, Viktor Nürnberg, Peter |
author_facet | Kawalia, Amit Motameny, Susanne Wonczak, Stephan Thiele, Holger Nieroda, Lech Jabbari, Kamel Borowski, Stefan Sinha, Vishal Gunia, Wilfried Lang, Ulrich Achter, Viktor Nürnberg, Peter |
author_sort | Kawalia, Amit |
collection | PubMed |
description | Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. |
format | Online Article Text |
id | pubmed-4420499 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-44204992015-05-12 Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow Kawalia, Amit Motameny, Susanne Wonczak, Stephan Thiele, Holger Nieroda, Lech Jabbari, Kamel Borowski, Stefan Sinha, Vishal Gunia, Wilfried Lang, Ulrich Achter, Viktor Nürnberg, Peter PLoS One Research Article Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. Public Library of Science 2015-05-05 /pmc/articles/PMC4420499/ /pubmed/25942438 http://dx.doi.org/10.1371/journal.pone.0126321 Text en © 2015 Kawalia et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Kawalia, Amit Motameny, Susanne Wonczak, Stephan Thiele, Holger Nieroda, Lech Jabbari, Kamel Borowski, Stefan Sinha, Vishal Gunia, Wilfried Lang, Ulrich Achter, Viktor Nürnberg, Peter Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow |
title | Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow |
title_full | Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow |
title_fullStr | Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow |
title_full_unstemmed | Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow |
title_short | Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow |
title_sort | leveraging the power of high performance computing for next generation sequencing data analysis: tricks and twists from a high throughput exome workflow |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4420499/ https://www.ncbi.nlm.nih.gov/pubmed/25942438 http://dx.doi.org/10.1371/journal.pone.0126321 |
work_keys_str_mv | AT kawaliaamit leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT motamenysusanne leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT wonczakstephan leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT thieleholger leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT nierodalech leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT jabbarikamel leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT borowskistefan leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT sinhavishal leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT guniawilfried leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT langulrich leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT achterviktor leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow AT nurnbergpeter leveragingthepowerofhighperformancecomputingfornextgenerationsequencingdataanalysistricksandtwistsfromahighthroughputexomeworkflow |