Cargando…
Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services
OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7534581/ https://www.ncbi.nlm.nih.gov/pubmed/32719837 http://dx.doi.org/10.1093/jamia/ocaa068 |
_version_ | 1783590338200338432 |
---|---|
author | Krissaane, Inès De Niz, Carlos Gutiérrez-Sacristán, Alba Korodi, Gabor Ede, Nneka Kumar, Ranjay Lyons, Jessica Manrai, Arjun Patel, Chirag Kohane, Isaac Avillach, Paul |
author_facet | Krissaane, Inès De Niz, Carlos Gutiérrez-Sacristán, Alba Korodi, Gabor Ede, Nneka Kumar, Ranjay Lyons, Jessica Manrai, Arjun Patel, Chirag Kohane, Isaac Avillach, Paul |
author_sort | Krissaane, Inès |
collection | PubMed |
description | OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. RESULTS: Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. CONCLUSIONS: We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost? |
format | Online Article Text |
id | pubmed-7534581 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-75345812020-10-09 Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services Krissaane, Inès De Niz, Carlos Gutiérrez-Sacristán, Alba Korodi, Gabor Ede, Nneka Kumar, Ranjay Lyons, Jessica Manrai, Arjun Patel, Chirag Kohane, Isaac Avillach, Paul J Am Med Inform Assoc Brief Communications OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. RESULTS: Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. CONCLUSIONS: We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost? Oxford University Press 2020-07-27 /pmc/articles/PMC7534581/ /pubmed/32719837 http://dx.doi.org/10.1093/jamia/ocaa068 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Brief Communications Krissaane, Inès De Niz, Carlos Gutiérrez-Sacristán, Alba Korodi, Gabor Ede, Nneka Kumar, Ranjay Lyons, Jessica Manrai, Arjun Patel, Chirag Kohane, Isaac Avillach, Paul Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services |
title | Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services |
title_full | Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services |
title_fullStr | Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services |
title_full_unstemmed | Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services |
title_short | Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services |
title_sort | scalability and cost-effectiveness analysis of whole genome-wide association studies on google cloud platform and amazon web services |
topic | Brief Communications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7534581/ https://www.ncbi.nlm.nih.gov/pubmed/32719837 http://dx.doi.org/10.1093/jamia/ocaa068 |
work_keys_str_mv | AT krissaaneines scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT denizcarlos scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT gutierrezsacristanalba scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT korodigabor scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT edenneka scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT kumarranjay scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT lyonsjessica scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT manraiarjun scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT patelchirag scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT kohaneisaac scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices AT avillachpaul scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices |