Cargando…
Benchmarking challenging small variants with linked and long reads
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challe...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9706577/ https://www.ncbi.nlm.nih.gov/pubmed/36452119 http://dx.doi.org/10.1016/j.xgen.2022.100128 |
_version_ | 1784840534527836160 |
---|---|
author | Wagner, Justin Olson, Nathan D. Harris, Lindsay Khan, Ziad Farek, Jesse Mahmoud, Medhat Stankovic, Ana Kovacevic, Vladimir Yoo, Byunggil Miller, Neil Rosenfeld, Jeffrey A. Ni, Bohan Zarate, Samantha Kirsche, Melanie Aganezov, Sergey Schatz, Michael C. Narzisi, Giuseppe Byrska-Bishop, Marta Clarke, Wayne Evani, Uday S. Markello, Charles Shafin, Kishwar Zhou, Xin Sidow, Arend Bansal, Vikas Ebert, Peter Marschall, Tobias Lansdorp, Peter Hanlon, Vincent Mattsson, Carl-Adam Barrio, Alvaro Martinez Fiddes, Ian T. Xiao, Chunlin Fungtammasan, Arkarachai Chin, Chen-Shan Wenger, Aaron M. Rowell, William J. Sedlazeck, Fritz J. Carroll, Andrew Salit, Marc Zook, Justin M. |
author_facet | Wagner, Justin Olson, Nathan D. Harris, Lindsay Khan, Ziad Farek, Jesse Mahmoud, Medhat Stankovic, Ana Kovacevic, Vladimir Yoo, Byunggil Miller, Neil Rosenfeld, Jeffrey A. Ni, Bohan Zarate, Samantha Kirsche, Melanie Aganezov, Sergey Schatz, Michael C. Narzisi, Giuseppe Byrska-Bishop, Marta Clarke, Wayne Evani, Uday S. Markello, Charles Shafin, Kishwar Zhou, Xin Sidow, Arend Bansal, Vikas Ebert, Peter Marschall, Tobias Lansdorp, Peter Hanlon, Vincent Mattsson, Carl-Adam Barrio, Alvaro Martinez Fiddes, Ian T. Xiao, Chunlin Fungtammasan, Arkarachai Chin, Chen-Shan Wenger, Aaron M. Rowell, William J. Sedlazeck, Fritz J. Carroll, Andrew Salit, Marc Zook, Justin M. |
author_sort | Wagner, Justin |
collection | PubMed |
description | Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development. |
format | Online Article Text |
id | pubmed-9706577 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-97065772022-11-29 Benchmarking challenging small variants with linked and long reads Wagner, Justin Olson, Nathan D. Harris, Lindsay Khan, Ziad Farek, Jesse Mahmoud, Medhat Stankovic, Ana Kovacevic, Vladimir Yoo, Byunggil Miller, Neil Rosenfeld, Jeffrey A. Ni, Bohan Zarate, Samantha Kirsche, Melanie Aganezov, Sergey Schatz, Michael C. Narzisi, Giuseppe Byrska-Bishop, Marta Clarke, Wayne Evani, Uday S. Markello, Charles Shafin, Kishwar Zhou, Xin Sidow, Arend Bansal, Vikas Ebert, Peter Marschall, Tobias Lansdorp, Peter Hanlon, Vincent Mattsson, Carl-Adam Barrio, Alvaro Martinez Fiddes, Ian T. Xiao, Chunlin Fungtammasan, Arkarachai Chin, Chen-Shan Wenger, Aaron M. Rowell, William J. Sedlazeck, Fritz J. Carroll, Andrew Salit, Marc Zook, Justin M. Cell Genom Article Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development. Elsevier 2022-04-28 /pmc/articles/PMC9706577/ /pubmed/36452119 http://dx.doi.org/10.1016/j.xgen.2022.100128 Text en https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Wagner, Justin Olson, Nathan D. Harris, Lindsay Khan, Ziad Farek, Jesse Mahmoud, Medhat Stankovic, Ana Kovacevic, Vladimir Yoo, Byunggil Miller, Neil Rosenfeld, Jeffrey A. Ni, Bohan Zarate, Samantha Kirsche, Melanie Aganezov, Sergey Schatz, Michael C. Narzisi, Giuseppe Byrska-Bishop, Marta Clarke, Wayne Evani, Uday S. Markello, Charles Shafin, Kishwar Zhou, Xin Sidow, Arend Bansal, Vikas Ebert, Peter Marschall, Tobias Lansdorp, Peter Hanlon, Vincent Mattsson, Carl-Adam Barrio, Alvaro Martinez Fiddes, Ian T. Xiao, Chunlin Fungtammasan, Arkarachai Chin, Chen-Shan Wenger, Aaron M. Rowell, William J. Sedlazeck, Fritz J. Carroll, Andrew Salit, Marc Zook, Justin M. Benchmarking challenging small variants with linked and long reads |
title | Benchmarking challenging small variants with linked and long reads |
title_full | Benchmarking challenging small variants with linked and long reads |
title_fullStr | Benchmarking challenging small variants with linked and long reads |
title_full_unstemmed | Benchmarking challenging small variants with linked and long reads |
title_short | Benchmarking challenging small variants with linked and long reads |
title_sort | benchmarking challenging small variants with linked and long reads |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9706577/ https://www.ncbi.nlm.nih.gov/pubmed/36452119 http://dx.doi.org/10.1016/j.xgen.2022.100128 |
work_keys_str_mv | AT wagnerjustin benchmarkingchallengingsmallvariantswithlinkedandlongreads AT olsonnathand benchmarkingchallengingsmallvariantswithlinkedandlongreads AT harrislindsay benchmarkingchallengingsmallvariantswithlinkedandlongreads AT khanziad benchmarkingchallengingsmallvariantswithlinkedandlongreads AT farekjesse benchmarkingchallengingsmallvariantswithlinkedandlongreads AT mahmoudmedhat benchmarkingchallengingsmallvariantswithlinkedandlongreads AT stankovicana benchmarkingchallengingsmallvariantswithlinkedandlongreads AT kovacevicvladimir benchmarkingchallengingsmallvariantswithlinkedandlongreads AT yoobyunggil benchmarkingchallengingsmallvariantswithlinkedandlongreads AT millerneil benchmarkingchallengingsmallvariantswithlinkedandlongreads AT rosenfeldjeffreya benchmarkingchallengingsmallvariantswithlinkedandlongreads AT nibohan benchmarkingchallengingsmallvariantswithlinkedandlongreads AT zaratesamantha benchmarkingchallengingsmallvariantswithlinkedandlongreads AT kirschemelanie benchmarkingchallengingsmallvariantswithlinkedandlongreads AT aganezovsergey benchmarkingchallengingsmallvariantswithlinkedandlongreads AT schatzmichaelc benchmarkingchallengingsmallvariantswithlinkedandlongreads AT narzisigiuseppe benchmarkingchallengingsmallvariantswithlinkedandlongreads AT byrskabishopmarta benchmarkingchallengingsmallvariantswithlinkedandlongreads AT clarkewayne benchmarkingchallengingsmallvariantswithlinkedandlongreads AT evaniudays benchmarkingchallengingsmallvariantswithlinkedandlongreads AT markellocharles benchmarkingchallengingsmallvariantswithlinkedandlongreads AT shafinkishwar benchmarkingchallengingsmallvariantswithlinkedandlongreads AT zhouxin benchmarkingchallengingsmallvariantswithlinkedandlongreads AT sidowarend benchmarkingchallengingsmallvariantswithlinkedandlongreads AT bansalvikas benchmarkingchallengingsmallvariantswithlinkedandlongreads AT ebertpeter benchmarkingchallengingsmallvariantswithlinkedandlongreads AT marschalltobias benchmarkingchallengingsmallvariantswithlinkedandlongreads AT lansdorppeter benchmarkingchallengingsmallvariantswithlinkedandlongreads AT hanlonvincent benchmarkingchallengingsmallvariantswithlinkedandlongreads AT mattssoncarladam benchmarkingchallengingsmallvariantswithlinkedandlongreads AT barrioalvaromartinez benchmarkingchallengingsmallvariantswithlinkedandlongreads AT fiddesiant benchmarkingchallengingsmallvariantswithlinkedandlongreads AT xiaochunlin benchmarkingchallengingsmallvariantswithlinkedandlongreads AT fungtammasanarkarachai benchmarkingchallengingsmallvariantswithlinkedandlongreads AT chinchenshan benchmarkingchallengingsmallvariantswithlinkedandlongreads AT wengeraaronm benchmarkingchallengingsmallvariantswithlinkedandlongreads AT rowellwilliamj benchmarkingchallengingsmallvariantswithlinkedandlongreads AT sedlazeckfritzj benchmarkingchallengingsmallvariantswithlinkedandlongreads AT carrollandrew benchmarkingchallengingsmallvariantswithlinkedandlongreads AT salitmarc benchmarkingchallengingsmallvariantswithlinkedandlongreads AT zookjustinm benchmarkingchallengingsmallvariantswithlinkedandlongreads |