Cargando…

Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that differen...

Descripción completa

Detalles Bibliográficos
Autores principales: Regier, Allison A., Farjoun, Yossi, Larson, David E., Krasheninina, Olga, Kang, Hyun Min, Howrigan, Daniel P., Chen, Bo-Juen, Kher, Manisha, Banks, Eric, Ames, Darren C., English, Adam C., Li, Heng, Xing, Jinchuan, Zhang, Yeting, Matise, Tara, Abecasis, Goncalo R., Salerno, Will, Zody, Michael C., Neale, Benjamin M., Hall, Ira M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6168605/
https://www.ncbi.nlm.nih.gov/pubmed/30279509
http://dx.doi.org/10.1038/s41467-018-06159-4
Descripción
Sumario:Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.