Cargando…

PoPLAR: Portal for Petascale Lifescience Applications and Research

BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rekapalli, Bhanu, Giblock, Paul, Reardon, Christopher
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3698029/ https://www.ncbi.nlm.nih.gov/pubmed/23902523 http://dx.doi.org/10.1186/1471-2105-14-S9-S3

_version_	1782275227423604736
author	Rekapalli, Bhanu Giblock, Paul Reardon, Christopher
author_facet	Rekapalli, Bhanu Giblock, Paul Reardon, Christopher
author_sort	Rekapalli, Bhanu
collection	PubMed
description	BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. METHODS: The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. RESULTS: This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. CONCLUSIONS: This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers.
format	Online Article Text
id	pubmed-3698029
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36980292013-07-02 PoPLAR: Portal for Petascale Lifescience Applications and Research Rekapalli, Bhanu Giblock, Paul Reardon, Christopher BMC Bioinformatics Methodology Article BACKGROUND: We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. METHODS: The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. RESULTS: This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. CONCLUSIONS: This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. BioMed Central 2013-06-28 /pmc/articles/PMC3698029/ /pubmed/23902523 http://dx.doi.org/10.1186/1471-2105-14-S9-S3 Text en Copyright © 2013 Rekapalli et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Rekapalli, Bhanu Giblock, Paul Reardon, Christopher PoPLAR: Portal for Petascale Lifescience Applications and Research
title	PoPLAR: Portal for Petascale Lifescience Applications and Research
title_full	PoPLAR: Portal for Petascale Lifescience Applications and Research
title_fullStr	PoPLAR: Portal for Petascale Lifescience Applications and Research
title_full_unstemmed	PoPLAR: Portal for Petascale Lifescience Applications and Research
title_short	PoPLAR: Portal for Petascale Lifescience Applications and Research
title_sort	poplar: portal for petascale lifescience applications and research
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3698029/ https://www.ncbi.nlm.nih.gov/pubmed/23902523 http://dx.doi.org/10.1186/1471-2105-14-S9-S3
work_keys_str_mv	AT rekapallibhanu poplarportalforpetascalelifescienceapplicationsandresearch AT giblockpaul poplarportalforpetascalelifescienceapplicationsandresearch AT reardonchristopher poplarportalforpetascalelifescienceapplicationsandresearch

PoPLAR: Portal for Petascale Lifescience Applications and Research

Ejemplares similares