Cargando…

Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall

Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and...

Descripción completa

Detalles Bibliográficos
Autores principales: Harvey, William T., Ebert, Peter, Ebler, Jana, Audano, Peter A., Munson, Katherine M., Hoekzema, Kendra, Porubsky, David, Beck, Christine R., Marschall, Tobias, Garimella, Kiran, Eichler, Evan E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187267/
https://www.ncbi.nlm.nih.gov/pubmed/37205567
http://dx.doi.org/10.1101/2023.05.04.539448
_version_ 1785042711158456320
author Harvey, William T.
Ebert, Peter
Ebler, Jana
Audano, Peter A.
Munson, Katherine M.
Hoekzema, Kendra
Porubsky, David
Beck, Christine R.
Marschall, Tobias
Garimella, Kiran
Eichler, Evan E.
author_facet Harvey, William T.
Ebert, Peter
Ebler, Jana
Audano, Peter A.
Munson, Katherine M.
Hoekzema, Kendra
Porubsky, David
Beck, Christine R.
Marschall, Tobias
Garimella, Kiran
Eichler, Evan E.
author_sort Harvey, William T.
collection PubMed
description Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
format Online
Article
Text
id pubmed-10187267
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101872672023-05-17 Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall Harvey, William T. Ebert, Peter Ebler, Jana Audano, Peter A. Munson, Katherine M. Hoekzema, Kendra Porubsky, David Beck, Christine R. Marschall, Tobias Garimella, Kiran Eichler, Evan E. bioRxiv Article Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology. Cold Spring Harbor Laboratory 2023-05-04 /pmc/articles/PMC10187267/ /pubmed/37205567 http://dx.doi.org/10.1101/2023.05.04.539448 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Harvey, William T.
Ebert, Peter
Ebler, Jana
Audano, Peter A.
Munson, Katherine M.
Hoekzema, Kendra
Porubsky, David
Beck, Christine R.
Marschall, Tobias
Garimella, Kiran
Eichler, Evan E.
Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall
title Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall
title_full Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall
title_fullStr Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall
title_full_unstemmed Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall
title_short Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall
title_sort whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10187267/
https://www.ncbi.nlm.nih.gov/pubmed/37205567
http://dx.doi.org/10.1101/2023.05.04.539448
work_keys_str_mv AT harveywilliamt wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT ebertpeter wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT eblerjana wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT audanopetera wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT munsonkatherinem wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT hoekzemakendra wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT porubskydavid wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT beckchristiner wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT marschalltobias wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT garimellakiran wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall
AT eichlerevane wholegenomelongreadsequencingdownsamplinganditseffectonvariantcallingprecisionandrecall