Family reunion via error correction: an efficient analysis of duplex sequencing data
BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PC...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7057607/ https://www.ncbi.nlm.nih.gov/pubmed/32131723 http://dx.doi.org/10.1186/s12859-020-3419-8 |
_version_ | 1783503697422057472 |
---|---|
author | Stoler, Nicholas Arbeithuber, Barbara Povysil, Gundula Heinzl, Monika Salazar, Renato Makova, Kateryna D Tiemann-Boege, Irene Nekrutenko, Anton |
author_facet | Stoler, Nicholas Arbeithuber, Barbara Povysil, Gundula Heinzl, Monika Salazar, Renato Makova, Kateryna D Tiemann-Boege, Irene Nekrutenko, Anton |
author_sort | Stoler, Nicholas |
collection | PubMed |
description | BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost—sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows “reuniting” these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo. |
format | Online Article Text |
id | pubmed-7057607 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-70576072020-03-10 Family reunion via error correction: an efficient analysis of duplex sequencing data Stoler, Nicholas Arbeithuber, Barbara Povysil, Gundula Heinzl, Monika Salazar, Renato Makova, Kateryna D Tiemann-Boege, Irene Nekrutenko, Anton BMC Bioinformatics Methodology Article BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost—sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows “reuniting” these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo. BioMed Central 2020-03-04 /pmc/articles/PMC7057607/ /pubmed/32131723 http://dx.doi.org/10.1186/s12859-020-3419-8 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Stoler, Nicholas Arbeithuber, Barbara Povysil, Gundula Heinzl, Monika Salazar, Renato Makova, Kateryna D Tiemann-Boege, Irene Nekrutenko, Anton Family reunion via error correction: an efficient analysis of duplex sequencing data |
title | Family reunion via error correction: an efficient analysis of duplex sequencing data |
title_full | Family reunion via error correction: an efficient analysis of duplex sequencing data |
title_fullStr | Family reunion via error correction: an efficient analysis of duplex sequencing data |
title_full_unstemmed | Family reunion via error correction: an efficient analysis of duplex sequencing data |
title_short | Family reunion via error correction: an efficient analysis of duplex sequencing data |
title_sort | family reunion via error correction: an efficient analysis of duplex sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7057607/ https://www.ncbi.nlm.nih.gov/pubmed/32131723 http://dx.doi.org/10.1186/s12859-020-3419-8 |
work_keys_str_mv | AT stolernicholas familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata AT arbeithuberbarbara familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata AT povysilgundula familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata AT heinzlmonika familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata AT salazarrenato familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata AT makovakaterynad familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata AT tiemannboegeirene familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata AT nekrutenkoanton familyreunionviaerrorcorrectionanefficientanalysisofduplexsequencingdata |