Family reunion via error correction: an efficient analysis of duplex sequencing data

BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PC...

Descripción completa

Detalles Bibliográficos
Autores principales: Stoler, Nicholas, Arbeithuber, Barbara, Povysil, Gundula, Heinzl, Monika, Salazar, Renato, Makova, Kateryna D, Tiemann-Boege, Irene, Nekrutenko, Anton
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7057607/
https://www.ncbi.nlm.nih.gov/pubmed/32131723
http://dx.doi.org/10.1186/s12859-020-3419-8
_version_ 1783503697422057472
author Stoler, Nicholas
Arbeithuber, Barbara
Povysil, Gundula
Heinzl, Monika
Salazar, Renato
Makova, Kateryna D
Tiemann-Boege, Irene
Nekrutenko, Anton
author_facet Stoler, Nicholas
Arbeithuber, Barbara
Povysil, Gundula
Heinzl, Monika
Salazar, Renato
Makova, Kateryna D
Tiemann-Boege, Irene
Nekrutenko, Anton
author_sort Stoler, Nicholas
collection PubMed
description BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost—sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows “reuniting” these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.
format Online
Article
Text
id pubmed-7057607
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70576072020-03-10 Family reunion via error correction: an efficient analysis of duplex sequencing data Stoler, Nicholas Arbeithuber, Barbara Povysil, Gundula Heinzl, Monika Salazar, Renato Makova, Kateryna D Tiemann-Boege, Irene Nekrutenko, Anton BMC Bioinformatics Methodology Article BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost—sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows “reuniting” these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo. BioMed Central 2020-03-04 /pmc/articles/PMC7057607/ /pubmed/32131723 http://dx.doi.org/10.1186/s12859-020-3419-8 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Stoler, Nicholas
Arbeithuber, Barbara
Povysil, Gundula
Heinzl, Monika
Salazar, Renato
Makova, Kateryna D
Tiemann-Boege, Irene
Nekrutenko, Anton
Family reunion via error correction: an efficient analysis of duplex sequencing data
title Family reunion via error correction: an efficient analysis of duplex sequencing data
title_full Family reunion via error correction: an efficient analysis of duplex sequencing data
title_fullStr Family reunion via error correction: an efficient analysis of duplex sequencing data
title_full_unstemmed Family reunion via error correction: an efficient analysis of duplex sequencing data
title_short Family reunion via error correction: an efficient analysis of duplex sequencing data
title_sort family reunion via error correction: an efficient analysis of duplex sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7057607/
https://www.ncbi.nlm.nih.gov/pubmed/32131723
http://dx.doi.org/10.1186/s12859-020-3419-8
work_keys_str_mv AT stolernicholas familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata
AT arbeithuberbarbara familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata
AT povysilgundula familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata
AT heinzlmonika familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata
AT salazarrenato familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata
AT makovakaterynad familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata
AT tiemannboegeirene familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata
AT nekrutenkoanton familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata