Cargando…

Accurate, scalable cohort variant calls using DeepVariant and GLnexus

MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scal...

Descripción completa

Detalles Bibliográficos
Autores principales: Yun, Taedong, Li, Helen, Chang, Pi-Chuan, Lin, Michael F, Carroll, Andrew, McLean, Cory Y
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8023681/
https://www.ncbi.nlm.nih.gov/pubmed/33399819
http://dx.doi.org/10.1093/bioinformatics/btaa1081
_version_ 1783675159256760320
author Yun, Taedong
Li, Helen
Chang, Pi-Chuan
Lin, Michael F
Carroll, Andrew
McLean, Cory Y
author_facet Yun, Taedong
Li, Helen
Chang, Pi-Chuan
Lin, Michael F
Carroll, Andrew
McLean, Cory Y
author_sort Yun, Taedong
collection PubMed
description MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8023681
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-80236812021-04-13 Accurate, scalable cohort variant calls using DeepVariant and GLnexus Yun, Taedong Li, Helen Chang, Pi-Chuan Lin, Michael F Carroll, Andrew McLean, Cory Y Bioinformatics Original Papers MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-01-05 /pmc/articles/PMC8023681/ /pubmed/33399819 http://dx.doi.org/10.1093/bioinformatics/btaa1081 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Yun, Taedong
Li, Helen
Chang, Pi-Chuan
Lin, Michael F
Carroll, Andrew
McLean, Cory Y
Accurate, scalable cohort variant calls using DeepVariant and GLnexus
title Accurate, scalable cohort variant calls using DeepVariant and GLnexus
title_full Accurate, scalable cohort variant calls using DeepVariant and GLnexus
title_fullStr Accurate, scalable cohort variant calls using DeepVariant and GLnexus
title_full_unstemmed Accurate, scalable cohort variant calls using DeepVariant and GLnexus
title_short Accurate, scalable cohort variant calls using DeepVariant and GLnexus
title_sort accurate, scalable cohort variant calls using deepvariant and glnexus
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8023681/
https://www.ncbi.nlm.nih.gov/pubmed/33399819
http://dx.doi.org/10.1093/bioinformatics/btaa1081
work_keys_str_mv AT yuntaedong accuratescalablecohortvariantcallsusingdeepvariantandglnexus
AT lihelen accuratescalablecohortvariantcallsusingdeepvariantandglnexus
AT changpichuan accuratescalablecohortvariantcallsusingdeepvariantandglnexus
AT linmichaelf accuratescalablecohortvariantcallsusingdeepvariantandglnexus
AT carrollandrew accuratescalablecohortvariantcallsusingdeepvariantandglnexus
AT mcleancoryy accuratescalablecohortvariantcallsusingdeepvariantandglnexus