Cargando…
An open resource for accurately benchmarking small variant and reference calls
Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle Consortium (GIAB), we apply a reproducible, cloud-based pipeline to integrate multiple short and linked read sequencin...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6500473/ https://www.ncbi.nlm.nih.gov/pubmed/30936564 http://dx.doi.org/10.1038/s41587-019-0074-6 |
_version_ | 1783415955854983168 |
---|---|
author | Zook, Justin M. McDaniel, Jennifer Olson, Nathan D. Wagner, Justin Parikh, Hemang Heaton, Haynes Irvine, Sean A. Trigg, Len Truty, Rebecca McLean, Cory Y. De La Vega, Francisco M. Xiao, Chunlin Sherry, Stephen Salit, Marc |
author_facet | Zook, Justin M. McDaniel, Jennifer Olson, Nathan D. Wagner, Justin Parikh, Hemang Heaton, Haynes Irvine, Sean A. Trigg, Len Truty, Rebecca McLean, Cory Y. De La Vega, Francisco M. Xiao, Chunlin Sherry, Stephen Salit, Marc |
author_sort | Zook, Justin M. |
collection | PubMed |
description | Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle Consortium (GIAB), we apply a reproducible, cloud-based pipeline to integrate multiple short and linked read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six broadly-consented genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a ‘first of its kind’ resource that is available to the community for multiple downstream applications. We produce 17% more benchmark SNVs, 176% more indels, and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context. |
format | Online Article Text |
id | pubmed-6500473 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
record_format | MEDLINE/PubMed |
spelling | pubmed-65004732019-10-01 An open resource for accurately benchmarking small variant and reference calls Zook, Justin M. McDaniel, Jennifer Olson, Nathan D. Wagner, Justin Parikh, Hemang Heaton, Haynes Irvine, Sean A. Trigg, Len Truty, Rebecca McLean, Cory Y. De La Vega, Francisco M. Xiao, Chunlin Sherry, Stephen Salit, Marc Nat Biotechnol Article Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle Consortium (GIAB), we apply a reproducible, cloud-based pipeline to integrate multiple short and linked read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six broadly-consented genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a ‘first of its kind’ resource that is available to the community for multiple downstream applications. We produce 17% more benchmark SNVs, 176% more indels, and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context. 2019-04-01 2019-05 /pmc/articles/PMC6500473/ /pubmed/30936564 http://dx.doi.org/10.1038/s41587-019-0074-6 Text en Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Zook, Justin M. McDaniel, Jennifer Olson, Nathan D. Wagner, Justin Parikh, Hemang Heaton, Haynes Irvine, Sean A. Trigg, Len Truty, Rebecca McLean, Cory Y. De La Vega, Francisco M. Xiao, Chunlin Sherry, Stephen Salit, Marc An open resource for accurately benchmarking small variant and reference calls |
title | An open resource for accurately benchmarking small variant and reference calls |
title_full | An open resource for accurately benchmarking small variant and reference calls |
title_fullStr | An open resource for accurately benchmarking small variant and reference calls |
title_full_unstemmed | An open resource for accurately benchmarking small variant and reference calls |
title_short | An open resource for accurately benchmarking small variant and reference calls |
title_sort | open resource for accurately benchmarking small variant and reference calls |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6500473/ https://www.ncbi.nlm.nih.gov/pubmed/30936564 http://dx.doi.org/10.1038/s41587-019-0074-6 |
work_keys_str_mv | AT zookjustinm anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT mcdanieljennifer anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT olsonnathand anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT wagnerjustin anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT parikhhemang anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT heatonhaynes anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT irvineseana anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT trigglen anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT trutyrebecca anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT mcleancoryy anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT delavegafranciscom anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT xiaochunlin anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT sherrystephen anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT salitmarc anopenresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT zookjustinm openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT mcdanieljennifer openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT olsonnathand openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT wagnerjustin openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT parikhhemang openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT heatonhaynes openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT irvineseana openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT trigglen openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT trutyrebecca openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT mcleancoryy openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT delavegafranciscom openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT xiaochunlin openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT sherrystephen openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls AT salitmarc openresourceforaccuratelybenchmarkingsmallvariantandreferencecalls |