Cargando…

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services

OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct...

Descripción completa

Detalles Bibliográficos
Autores principales: Krissaane, Inès, De Niz, Carlos, Gutiérrez-Sacristán, Alba, Korodi, Gabor, Ede, Nneka, Kumar, Ranjay, Lyons, Jessica, Manrai, Arjun, Patel, Chirag, Kohane, Isaac, Avillach, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7534581/
https://www.ncbi.nlm.nih.gov/pubmed/32719837
http://dx.doi.org/10.1093/jamia/ocaa068
_version_ 1783590338200338432
author Krissaane, Inès
De Niz, Carlos
Gutiérrez-Sacristán, Alba
Korodi, Gabor
Ede, Nneka
Kumar, Ranjay
Lyons, Jessica
Manrai, Arjun
Patel, Chirag
Kohane, Isaac
Avillach, Paul
author_facet Krissaane, Inès
De Niz, Carlos
Gutiérrez-Sacristán, Alba
Korodi, Gabor
Ede, Nneka
Kumar, Ranjay
Lyons, Jessica
Manrai, Arjun
Patel, Chirag
Kohane, Isaac
Avillach, Paul
author_sort Krissaane, Inès
collection PubMed
description OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. RESULTS: Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. CONCLUSIONS: We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?
format Online
Article
Text
id pubmed-7534581
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-75345812020-10-09 Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services Krissaane, Inès De Niz, Carlos Gutiérrez-Sacristán, Alba Korodi, Gabor Ede, Nneka Kumar, Ranjay Lyons, Jessica Manrai, Arjun Patel, Chirag Kohane, Isaac Avillach, Paul J Am Med Inform Assoc Brief Communications OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. RESULTS: Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. CONCLUSIONS: We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost? Oxford University Press 2020-07-27 /pmc/articles/PMC7534581/ /pubmed/32719837 http://dx.doi.org/10.1093/jamia/ocaa068 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Brief Communications
Krissaane, Inès
De Niz, Carlos
Gutiérrez-Sacristán, Alba
Korodi, Gabor
Ede, Nneka
Kumar, Ranjay
Lyons, Jessica
Manrai, Arjun
Patel, Chirag
Kohane, Isaac
Avillach, Paul
Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services
title Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services
title_full Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services
title_fullStr Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services
title_full_unstemmed Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services
title_short Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services
title_sort scalability and cost-effectiveness analysis of whole genome-wide association studies on google cloud platform and amazon web services
topic Brief Communications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7534581/
https://www.ncbi.nlm.nih.gov/pubmed/32719837
http://dx.doi.org/10.1093/jamia/ocaa068
work_keys_str_mv AT krissaaneines scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT denizcarlos scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT gutierrezsacristanalba scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT korodigabor scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT edenneka scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT kumarranjay scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT lyonsjessica scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT manraiarjun scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT patelchirag scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT kohaneisaac scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices
AT avillachpaul scalabilityandcosteffectivenessanalysisofwholegenomewideassociationstudiesongooglecloudplatformandamazonwebservices