Cargando…
PoPLAR: Portal for Petascale Lifescience Applications and Research
BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3698029/ https://www.ncbi.nlm.nih.gov/pubmed/23902523 http://dx.doi.org/10.1186/1471-2105-14-S9-S3 |
_version_ | 1782275227423604736 |
---|---|
author | Rekapalli, Bhanu Giblock, Paul Reardon, Christopher |
author_facet | Rekapalli, Bhanu Giblock, Paul Reardon, Christopher |
author_sort | Rekapalli, Bhanu |
collection | PubMed |
description | BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. METHODS: The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. RESULTS: This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. CONCLUSIONS: This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. |
format | Online Article Text |
id | pubmed-3698029 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36980292013-07-02 PoPLAR: Portal for Petascale Lifescience Applications and Research Rekapalli, Bhanu Giblock, Paul Reardon, Christopher BMC Bioinformatics Methodology Article BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. METHODS: The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. RESULTS: This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. CONCLUSIONS: This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. BioMed Central 2013-06-28 /pmc/articles/PMC3698029/ /pubmed/23902523 http://dx.doi.org/10.1186/1471-2105-14-S9-S3 Text en Copyright © 2013 Rekapalli et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Rekapalli, Bhanu Giblock, Paul Reardon, Christopher PoPLAR: Portal for Petascale Lifescience Applications and Research |
title | PoPLAR: Portal for Petascale Lifescience Applications and Research |
title_full | PoPLAR: Portal for Petascale Lifescience Applications and Research |
title_fullStr | PoPLAR: Portal for Petascale Lifescience Applications and Research |
title_full_unstemmed | PoPLAR: Portal for Petascale Lifescience Applications and Research |
title_short | PoPLAR: Portal for Petascale Lifescience Applications and Research |
title_sort | poplar: portal for petascale lifescience applications and research |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3698029/ https://www.ncbi.nlm.nih.gov/pubmed/23902523 http://dx.doi.org/10.1186/1471-2105-14-S9-S3 |
work_keys_str_mv | AT rekapallibhanu poplarportalforpetascalelifescienceapplicationsandresearch AT giblockpaul poplarportalforpetascalelifescienceapplicationsandresearch AT reardonchristopher poplarportalforpetascalelifescienceapplicationsandresearch |