Cargando…

DOCEST—fast and accurate estimator of human NGS sequencing depth and error rate

MOTIVATION: Accurate estimation of next-generation sequencing depth of coverage is needed for detecting the copy number of repeated elements in the human genome. The common methods for estimating sequencing depth are based on counting the number of reads mapped to the genome or subgenomic regions. S...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaplinski, Lauris, Möls, Märt, Puurand, Tarmo, Remm, Maido
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10460481/
https://www.ncbi.nlm.nih.gov/pubmed/37641716
http://dx.doi.org/10.1093/bioadv/vbad084
_version_ 1785097653610086400
author Kaplinski, Lauris
Möls, Märt
Puurand, Tarmo
Remm, Maido
author_facet Kaplinski, Lauris
Möls, Märt
Puurand, Tarmo
Remm, Maido
author_sort Kaplinski, Lauris
collection PubMed
description MOTIVATION: Accurate estimation of next-generation sequencing depth of coverage is needed for detecting the copy number of repeated elements in the human genome. The common methods for estimating sequencing depth are based on counting the number of reads mapped to the genome or subgenomic regions. Such methods are sensitive to the mapping quality. The presence of contamination or the large deviance of an individual genome from the reference may introduce bias in depth estimation. RESULTS: Here, we present an algorithm and implementation for estimating both the sequencing depth and error rate from unmapped reads using a uniquely filtered k-mer set. On simulated reads with 20× coverage, the margin of error was less than 0.01%. At 0.01× coverage and the presence of 10-fold contamination, the precision was within 2% for depth and within 10% for error rate. AVAILABILITY AND IMPLEMENTATION: DOCEST program and database can be downloaded from https://bioinfo.ut.ee/docest/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-10460481
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104604812023-08-28 DOCEST—fast and accurate estimator of human NGS sequencing depth and error rate Kaplinski, Lauris Möls, Märt Puurand, Tarmo Remm, Maido Bioinform Adv Application Note MOTIVATION: Accurate estimation of next-generation sequencing depth of coverage is needed for detecting the copy number of repeated elements in the human genome. The common methods for estimating sequencing depth are based on counting the number of reads mapped to the genome or subgenomic regions. Such methods are sensitive to the mapping quality. The presence of contamination or the large deviance of an individual genome from the reference may introduce bias in depth estimation. RESULTS: Here, we present an algorithm and implementation for estimating both the sequencing depth and error rate from unmapped reads using a uniquely filtered k-mer set. On simulated reads with 20× coverage, the margin of error was less than 0.01%. At 0.01× coverage and the presence of 10-fold contamination, the precision was within 2% for depth and within 10% for error rate. AVAILABILITY AND IMPLEMENTATION: DOCEST program and database can be downloaded from https://bioinfo.ut.ee/docest/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-07-18 /pmc/articles/PMC10460481/ /pubmed/37641716 http://dx.doi.org/10.1093/bioadv/vbad084 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Note
Kaplinski, Lauris
Möls, Märt
Puurand, Tarmo
Remm, Maido
DOCEST—fast and accurate estimator of human NGS sequencing depth and error rate
title DOCEST—fast and accurate estimator of human NGS sequencing depth and error rate
title_full DOCEST—fast and accurate estimator of human NGS sequencing depth and error rate
title_fullStr DOCEST—fast and accurate estimator of human NGS sequencing depth and error rate
title_full_unstemmed DOCEST—fast and accurate estimator of human NGS sequencing depth and error rate
title_short DOCEST—fast and accurate estimator of human NGS sequencing depth and error rate
title_sort docest—fast and accurate estimator of human ngs sequencing depth and error rate
topic Application Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10460481/
https://www.ncbi.nlm.nih.gov/pubmed/37641716
http://dx.doi.org/10.1093/bioadv/vbad084
work_keys_str_mv AT kaplinskilauris docestfastandaccurateestimatorofhumanngssequencingdepthanderrorrate
AT molsmart docestfastandaccurateestimatorofhumanngssequencingdepthanderrorrate
AT puurandtarmo docestfastandaccurateestimatorofhumanngssequencingdepthanderrorrate
AT remmmaido docestfastandaccurateestimatorofhumanngssequencingdepthanderrorrate