Cargando…

PoPLAR: Portal for Petascale Lifescience Applications and Research

BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasi...

Descripción completa

Detalles Bibliográficos
Autores principales: Rekapalli, Bhanu, Giblock, Paul, Reardon, Christopher
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3698029/
https://www.ncbi.nlm.nih.gov/pubmed/23902523
http://dx.doi.org/10.1186/1471-2105-14-S9-S3
_version_ 1782275227423604736
author Rekapalli, Bhanu
Giblock, Paul
Reardon, Christopher
author_facet Rekapalli, Bhanu
Giblock, Paul
Reardon, Christopher
author_sort Rekapalli, Bhanu
collection PubMed
description BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. METHODS: The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. RESULTS: This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. CONCLUSIONS: This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers.
format Online
Article
Text
id pubmed-3698029
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36980292013-07-02 PoPLAR: Portal for Petascale Lifescience Applications and Research Rekapalli, Bhanu Giblock, Paul Reardon, Christopher BMC Bioinformatics Methodology Article BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. METHODS: The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. RESULTS: This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. CONCLUSIONS: This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. BioMed Central 2013-06-28 /pmc/articles/PMC3698029/ /pubmed/23902523 http://dx.doi.org/10.1186/1471-2105-14-S9-S3 Text en Copyright © 2013 Rekapalli et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Rekapalli, Bhanu
Giblock, Paul
Reardon, Christopher
PoPLAR: Portal for Petascale Lifescience Applications and Research
title PoPLAR: Portal for Petascale Lifescience Applications and Research
title_full PoPLAR: Portal for Petascale Lifescience Applications and Research
title_fullStr PoPLAR: Portal for Petascale Lifescience Applications and Research
title_full_unstemmed PoPLAR: Portal for Petascale Lifescience Applications and Research
title_short PoPLAR: Portal for Petascale Lifescience Applications and Research
title_sort poplar: portal for petascale lifescience applications and research
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3698029/
https://www.ncbi.nlm.nih.gov/pubmed/23902523
http://dx.doi.org/10.1186/1471-2105-14-S9-S3
work_keys_str_mv AT rekapallibhanu poplarportalforpetascalelifescienceapplicationsandresearch
AT giblockpaul poplarportalforpetascalelifescienceapplicationsandresearch
AT reardonchristopher poplarportalforpetascalelifescienceapplicationsandresearch