Cargando…

An open resource for accurately benchmarking small variant and reference calls

Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle Consortium (GIAB), we apply a reproducible, cloud-based pipeline to integrate multiple short and linked read sequencin...

Descripción completa

Detalles Bibliográficos
Autores principales: Zook, Justin M., McDaniel, Jennifer, Olson, Nathan D., Wagner, Justin, Parikh, Hemang, Heaton, Haynes, Irvine, Sean A., Trigg, Len, Truty, Rebecca, McLean, Cory Y., De La Vega, Francisco M., Xiao, Chunlin, Sherry, Stephen, Salit, Marc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6500473/
https://www.ncbi.nlm.nih.gov/pubmed/30936564
http://dx.doi.org/10.1038/s41587-019-0074-6
_version_ 1783415955854983168
author Zook, Justin M.
McDaniel, Jennifer
Olson, Nathan D.
Wagner, Justin
Parikh, Hemang
Heaton, Haynes
Irvine, Sean A.
Trigg, Len
Truty, Rebecca
McLean, Cory Y.
De La Vega, Francisco M.
Xiao, Chunlin
Sherry, Stephen
Salit, Marc
author_facet Zook, Justin M.
McDaniel, Jennifer
Olson, Nathan D.
Wagner, Justin
Parikh, Hemang
Heaton, Haynes
Irvine, Sean A.
Trigg, Len
Truty, Rebecca
McLean, Cory Y.
De La Vega, Francisco M.
Xiao, Chunlin
Sherry, Stephen
Salit, Marc
author_sort Zook, Justin M.
collection PubMed
description Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle Consortium (GIAB), we apply a reproducible, cloud-based pipeline to integrate multiple short and linked read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six broadly-consented genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a ‘first of its kind’ resource that is available to the community for multiple downstream applications. We produce 17% more benchmark SNVs, 176% more indels, and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.
format Online
Article
Text
id pubmed-6500473
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-65004732019-10-01 An open resource for accurately benchmarking small variant and reference calls Zook, Justin M. McDaniel, Jennifer Olson, Nathan D. Wagner, Justin Parikh, Hemang Heaton, Haynes Irvine, Sean A. Trigg, Len Truty, Rebecca McLean, Cory Y. De La Vega, Francisco M. Xiao, Chunlin Sherry, Stephen Salit, Marc Nat Biotechnol Article Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle Consortium (GIAB), we apply a reproducible, cloud-based pipeline to integrate multiple short and linked read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six broadly-consented genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a ‘first of its kind’ resource that is available to the community for multiple downstream applications. We produce 17% more benchmark SNVs, 176% more indels, and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context. 2019-04-01 2019-05 /pmc/articles/PMC6500473/ /pubmed/30936564 http://dx.doi.org/10.1038/s41587-019-0074-6 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Zook, Justin M.
McDaniel, Jennifer
Olson, Nathan D.
Wagner, Justin
Parikh, Hemang
Heaton, Haynes
Irvine, Sean A.
Trigg, Len
Truty, Rebecca
McLean, Cory Y.
De La Vega, Francisco M.
Xiao, Chunlin
Sherry, Stephen
Salit, Marc
An open resource for accurately benchmarking small variant and reference calls
title An open resource for accurately benchmarking small variant and reference calls
title_full An open resource for accurately benchmarking small variant and reference calls
title_fullStr An open resource for accurately benchmarking small variant and reference calls
title_full_unstemmed An open resource for accurately benchmarking small variant and reference calls
title_short An open resource for accurately benchmarking small variant and reference calls
title_sort open resource for accurately benchmarking small variant and reference calls
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6500473/
https://www.ncbi.nlm.nih.gov/pubmed/30936564
http://dx.doi.org/10.1038/s41587-019-0074-6
work_keys_str_mv AT zookjustinm anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT mcdanieljennifer anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT olsonnathand anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT wagnerjustin anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT parikhhemang anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT heatonhaynes anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT irvineseana anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT trigglen anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT trutyrebecca anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT mcleancoryy anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT delavegafranciscom anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT xiaochunlin anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT sherrystephen anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT salitmarc anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT zookjustinm openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT mcdanieljennifer openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT olsonnathand openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT wagnerjustin openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT parikhhemang openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT heatonhaynes openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT irvineseana openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT trigglen openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT trutyrebecca openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT mcleancoryy openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT delavegafranciscom openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT xiaochunlin openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT sherrystephen openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls
AT salitmarc openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls