Cargando…
Accurate, scalable cohort variant calls using DeepVariant and GLnexus
MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scal...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8023681/ https://www.ncbi.nlm.nih.gov/pubmed/33399819 http://dx.doi.org/10.1093/bioinformatics/btaa1081 |
_version_ | 1783675159256760320 |
---|---|
author | Yun, Taedong Li, Helen Chang, Pi-Chuan Lin, Michael F Carroll, Andrew McLean, Cory Y |
author_facet | Yun, Taedong Li, Helen Chang, Pi-Chuan Lin, Michael F Carroll, Andrew McLean, Cory Y |
author_sort | Yun, Taedong |
collection | PubMed |
description | MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8023681 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-80236812021-04-13 Accurate, scalable cohort variant calls using DeepVariant and GLnexus Yun, Taedong Li, Helen Chang, Pi-Chuan Lin, Michael F Carroll, Andrew McLean, Cory Y Bioinformatics Original Papers MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-01-05 /pmc/articles/PMC8023681/ /pubmed/33399819 http://dx.doi.org/10.1093/bioinformatics/btaa1081 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Yun, Taedong Li, Helen Chang, Pi-Chuan Lin, Michael F Carroll, Andrew McLean, Cory Y Accurate, scalable cohort variant calls using DeepVariant and GLnexus |
title | Accurate, scalable cohort variant calls using DeepVariant and GLnexus |
title_full | Accurate, scalable cohort variant calls using DeepVariant and GLnexus |
title_fullStr | Accurate, scalable cohort variant calls using DeepVariant and GLnexus |
title_full_unstemmed | Accurate, scalable cohort variant calls using DeepVariant and GLnexus |
title_short | Accurate, scalable cohort variant calls using DeepVariant and GLnexus |
title_sort | accurate, scalable cohort variant calls using deepvariant and glnexus |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8023681/ https://www.ncbi.nlm.nih.gov/pubmed/33399819 http://dx.doi.org/10.1093/bioinformatics/btaa1081 |
work_keys_str_mv | AT yuntaedong accuratescalablecohortvariantcallsusingdeepvariantandglnexus AT lihelen accuratescalablecohortvariantcallsusingdeepvariantandglnexus AT changpichuan accuratescalablecohortvariantcallsusingdeepvariantandglnexus AT linmichaelf accuratescalablecohortvariantcallsusingdeepvariantandglnexus AT carrollandrew accuratescalablecohortvariantcallsusingdeepvariantandglnexus AT mcleancoryy accuratescalablecohortvariantcallsusingdeepvariantandglnexus |