Cargando…

Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that differen...

Descripción completa

Detalles Bibliográficos
Autores principales: Regier, Allison A., Farjoun, Yossi, Larson, David E., Krasheninina, Olga, Kang, Hyun Min, Howrigan, Daniel P., Chen, Bo-Juen, Kher, Manisha, Banks, Eric, Ames, Darren C., English, Adam C., Li, Heng, Xing, Jinchuan, Zhang, Yeting, Matise, Tara, Abecasis, Goncalo R., Salerno, Will, Zody, Michael C., Neale, Benjamin M., Hall, Ira M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6168605/
https://www.ncbi.nlm.nih.gov/pubmed/30279509
http://dx.doi.org/10.1038/s41467-018-06159-4
_version_ 1783360385357709312
author Regier, Allison A.
Farjoun, Yossi
Larson, David E.
Krasheninina, Olga
Kang, Hyun Min
Howrigan, Daniel P.
Chen, Bo-Juen
Kher, Manisha
Banks, Eric
Ames, Darren C.
English, Adam C.
Li, Heng
Xing, Jinchuan
Zhang, Yeting
Matise, Tara
Abecasis, Goncalo R.
Salerno, Will
Zody, Michael C.
Neale, Benjamin M.
Hall, Ira M.
author_facet Regier, Allison A.
Farjoun, Yossi
Larson, David E.
Krasheninina, Olga
Kang, Hyun Min
Howrigan, Daniel P.
Chen, Bo-Juen
Kher, Manisha
Banks, Eric
Ames, Darren C.
English, Adam C.
Li, Heng
Xing, Jinchuan
Zhang, Yeting
Matise, Tara
Abecasis, Goncalo R.
Salerno, Will
Zody, Michael C.
Neale, Benjamin M.
Hall, Ira M.
author_sort Regier, Allison A.
collection PubMed
description Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.
format Online
Article
Text
id pubmed-6168605
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-61686052018-10-04 Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects Regier, Allison A. Farjoun, Yossi Larson, David E. Krasheninina, Olga Kang, Hyun Min Howrigan, Daniel P. Chen, Bo-Juen Kher, Manisha Banks, Eric Ames, Darren C. English, Adam C. Li, Heng Xing, Jinchuan Zhang, Yeting Matise, Tara Abecasis, Goncalo R. Salerno, Will Zody, Michael C. Neale, Benjamin M. Hall, Ira M. Nat Commun Article Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies. Nature Publishing Group UK 2018-10-02 /pmc/articles/PMC6168605/ /pubmed/30279509 http://dx.doi.org/10.1038/s41467-018-06159-4 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Regier, Allison A.
Farjoun, Yossi
Larson, David E.
Krasheninina, Olga
Kang, Hyun Min
Howrigan, Daniel P.
Chen, Bo-Juen
Kher, Manisha
Banks, Eric
Ames, Darren C.
English, Adam C.
Li, Heng
Xing, Jinchuan
Zhang, Yeting
Matise, Tara
Abecasis, Goncalo R.
Salerno, Will
Zody, Michael C.
Neale, Benjamin M.
Hall, Ira M.
Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
title Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
title_full Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
title_fullStr Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
title_full_unstemmed Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
title_short Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
title_sort functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6168605/
https://www.ncbi.nlm.nih.gov/pubmed/30279509
http://dx.doi.org/10.1038/s41467-018-06159-4
work_keys_str_mv AT regierallisona functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT farjounyossi functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT larsondavide functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT krashenininaolga functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT kanghyunmin functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT howrigandanielp functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT chenbojuen functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT khermanisha functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT bankseric functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT amesdarrenc functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT englishadamc functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT liheng functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT xingjinchuan functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT zhangyeting functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT matisetara functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT abecasisgoncalor functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT salernowill functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT zodymichaelc functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT nealebenjaminm functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects
AT halliram functionalequivalenceofgenomesequencinganalysispipelinesenablesharmonizedvariantcallingacrosshumangeneticsprojects