Cargando…

Cloud computing for comparative genomics

BACKGROUND: Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the brea...

Descripción completa

Detalles Bibliográficos
Autores principales: Wall, Dennis P, Kudtarkar, Parul, Fusaro, Vincent A, Pivovarov, Rimma, Patil, Prasad, Tonellato, Peter J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098063/
https://www.ncbi.nlm.nih.gov/pubmed/20482786
http://dx.doi.org/10.1186/1471-2105-11-259
_version_ 1782203908818468864
author Wall, Dennis P
Kudtarkar, Parul
Fusaro, Vincent A
Pivovarov, Rimma
Patil, Prasad
Tonellato, Peter J
author_facet Wall, Dennis P
Kudtarkar, Parul
Fusaro, Vincent A
Pivovarov, Rimma
Patil, Prasad
Tonellato, Peter J
author_sort Wall, Dennis P
collection PubMed
description BACKGROUND: Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. RESULTS: We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. CONCLUSIONS: The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.
format Text
id pubmed-3098063
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30980632011-05-20 Cloud computing for comparative genomics Wall, Dennis P Kudtarkar, Parul Fusaro, Vincent A Pivovarov, Rimma Patil, Prasad Tonellato, Peter J BMC Bioinformatics Methodology Article BACKGROUND: Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. RESULTS: We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. CONCLUSIONS: The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. BioMed Central 2010-05-18 /pmc/articles/PMC3098063/ /pubmed/20482786 http://dx.doi.org/10.1186/1471-2105-11-259 Text en Copyright ©2010 Wall et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Wall, Dennis P
Kudtarkar, Parul
Fusaro, Vincent A
Pivovarov, Rimma
Patil, Prasad
Tonellato, Peter J
Cloud computing for comparative genomics
title Cloud computing for comparative genomics
title_full Cloud computing for comparative genomics
title_fullStr Cloud computing for comparative genomics
title_full_unstemmed Cloud computing for comparative genomics
title_short Cloud computing for comparative genomics
title_sort cloud computing for comparative genomics
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098063/
https://www.ncbi.nlm.nih.gov/pubmed/20482786
http://dx.doi.org/10.1186/1471-2105-11-259
work_keys_str_mv AT walldennisp cloudcomputingforcomparativegenomics
AT kudtarkarparul cloudcomputingforcomparativegenomics
AT fusarovincenta cloudcomputingforcomparativegenomics
AT pivovarovrimma cloudcomputingforcomparativegenomics
AT patilprasad cloudcomputingforcomparativegenomics
AT tonellatopeterj cloudcomputingforcomparativegenomics