Cargando…

DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework

Although sequencing a human genome has become affordable, identifying genetic variants from whole-genome sequence data is still a hurdle for researchers without adequate computing equipment or bioinformatics support. GATK is a gold standard method for the identification of genetic variants and has b...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Po-Jung, Chang, Jui-Huan, Lin, Hou-Hsien, Li, Yu-Xuan, Lee, Chi-Ching, Su, Chung-Tsai, Li, Yun-Lung, Chang, Ming-Tai, Weng, Sid, Cheng, Wei-Hung, Chiu, Cheng-Hsun, Tang, Petrus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7481958/
https://www.ncbi.nlm.nih.gov/pubmed/32952600
http://dx.doi.org/10.1155/2020/7231205
_version_ 1783580716647317504
author Huang, Po-Jung
Chang, Jui-Huan
Lin, Hou-Hsien
Li, Yu-Xuan
Lee, Chi-Ching
Su, Chung-Tsai
Li, Yun-Lung
Chang, Ming-Tai
Weng, Sid
Cheng, Wei-Hung
Chiu, Cheng-Hsun
Tang, Petrus
author_facet Huang, Po-Jung
Chang, Jui-Huan
Lin, Hou-Hsien
Li, Yu-Xuan
Lee, Chi-Ching
Su, Chung-Tsai
Li, Yun-Lung
Chang, Ming-Tai
Weng, Sid
Cheng, Wei-Hung
Chiu, Cheng-Hsun
Tang, Petrus
author_sort Huang, Po-Jung
collection PubMed
description Although sequencing a human genome has become affordable, identifying genetic variants from whole-genome sequence data is still a hurdle for researchers without adequate computing equipment or bioinformatics support. GATK is a gold standard method for the identification of genetic variants and has been widely used in genome projects and population genetic studies for many years. This was until the Google Brain team developed a new method, DeepVariant, which utilizes deep neural networks to construct an image classification model to identify genetic variants. However, the superior accuracy of DeepVariant comes at the cost of computational intensity, largely constraining its applications. Accordingly, we present DeepVariant-on-Spark to optimize resource allocation, enable multi-GPU support, and accelerate the processing of the DeepVariant pipeline. To make DeepVariant-on-Spark more accessible to everyone, we have deployed the DeepVariant-on-Spark to the Google Cloud Platform (GCP). Users can deploy DeepVariant-on-Spark on the GCP following our instruction within 20 minutes and start to analyze at least ten whole-genome sequencing datasets using free credits provided by the GCP. DeepVaraint-on-Spark is freely available for small-scale genome analysis using a cloud-based computing framework, which is suitable for pilot testing or preliminary study, while reserving the flexibility and scalability for large-scale sequencing projects.
format Online
Article
Text
id pubmed-7481958
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-74819582020-09-18 DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework Huang, Po-Jung Chang, Jui-Huan Lin, Hou-Hsien Li, Yu-Xuan Lee, Chi-Ching Su, Chung-Tsai Li, Yun-Lung Chang, Ming-Tai Weng, Sid Cheng, Wei-Hung Chiu, Cheng-Hsun Tang, Petrus Comput Math Methods Med Research Article Although sequencing a human genome has become affordable, identifying genetic variants from whole-genome sequence data is still a hurdle for researchers without adequate computing equipment or bioinformatics support. GATK is a gold standard method for the identification of genetic variants and has been widely used in genome projects and population genetic studies for many years. This was until the Google Brain team developed a new method, DeepVariant, which utilizes deep neural networks to construct an image classification model to identify genetic variants. However, the superior accuracy of DeepVariant comes at the cost of computational intensity, largely constraining its applications. Accordingly, we present DeepVariant-on-Spark to optimize resource allocation, enable multi-GPU support, and accelerate the processing of the DeepVariant pipeline. To make DeepVariant-on-Spark more accessible to everyone, we have deployed the DeepVariant-on-Spark to the Google Cloud Platform (GCP). Users can deploy DeepVariant-on-Spark on the GCP following our instruction within 20 minutes and start to analyze at least ten whole-genome sequencing datasets using free credits provided by the GCP. DeepVaraint-on-Spark is freely available for small-scale genome analysis using a cloud-based computing framework, which is suitable for pilot testing or preliminary study, while reserving the flexibility and scalability for large-scale sequencing projects. Hindawi 2020-09-01 /pmc/articles/PMC7481958/ /pubmed/32952600 http://dx.doi.org/10.1155/2020/7231205 Text en Copyright © 2020 Po-Jung Huang et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Huang, Po-Jung
Chang, Jui-Huan
Lin, Hou-Hsien
Li, Yu-Xuan
Lee, Chi-Ching
Su, Chung-Tsai
Li, Yun-Lung
Chang, Ming-Tai
Weng, Sid
Cheng, Wei-Hung
Chiu, Cheng-Hsun
Tang, Petrus
DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_full DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_fullStr DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_full_unstemmed DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_short DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework
title_sort deepvariant-on-spark: small-scale genome analysis using a cloud-based computing framework
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7481958/
https://www.ncbi.nlm.nih.gov/pubmed/32952600
http://dx.doi.org/10.1155/2020/7231205
work_keys_str_mv AT huangpojung deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT changjuihuan deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT linhouhsien deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT liyuxuan deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT leechiching deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT suchungtsai deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT liyunlung deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT changmingtai deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT wengsid deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT chengweihung deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT chiuchenghsun deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework
AT tangpetrus deepvariantonsparksmallscalegenomeanalysisusingacloudbasedcomputingframework