Cargando…

Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services

OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct...

Descripción completa

Detalles Bibliográficos
Autores principales: Krissaane, Inès, De Niz, Carlos, Gutiérrez-Sacristán, Alba, Korodi, Gabor, Ede, Nneka, Kumar, Ranjay, Lyons, Jessica, Manrai, Arjun, Patel, Chirag, Kohane, Isaac, Avillach, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7534581/
https://www.ncbi.nlm.nih.gov/pubmed/32719837
http://dx.doi.org/10.1093/jamia/ocaa068
Descripción
Sumario:OBJECTIVE: Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. METHODS: We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. RESULTS: Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. CONCLUSIONS: We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?