Cargando…

DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets

DNA metabarcoding is broadly used in biodiversity studies encompassing a wide range of organisms. Erroneous amplicons, generated during amplification and sequencing procedures, constitute one of the major sources of concern for the interpretation of metabarcoding results. Several denoising programs...

Descripción completa

Detalles Bibliográficos
Autores principales: Antich, Adrià, Palacín, Creu, Turon, Xavier, Wangensteen, Owen S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8783565/
https://www.ncbi.nlm.nih.gov/pubmed/35111399
http://dx.doi.org/10.7717/peerj.12758
_version_ 1784638565664161792
author Antich, Adrià
Palacín, Creu
Turon, Xavier
Wangensteen, Owen S.
author_facet Antich, Adrià
Palacín, Creu
Turon, Xavier
Wangensteen, Owen S.
author_sort Antich, Adrià
collection PubMed
description DNA metabarcoding is broadly used in biodiversity studies encompassing a wide range of organisms. Erroneous amplicons, generated during amplification and sequencing procedures, constitute one of the major sources of concern for the interpretation of metabarcoding results. Several denoising programs have been implemented to detect and eliminate these errors. However, almost all denoising software currently available has been designed to process non-coding ribosomal sequences, most notably prokaryotic 16S rDNA. The growing number of metabarcoding studies using coding markers such as COI or RuBisCO demands a re-assessment and calibration of denoising algorithms. Here we present DnoisE, the first denoising program designed to detect erroneous reads and merge them with the correct ones using information from the natural variability (entropy) associated to each codon position in coding barcodes. We have developed an open-source software using a modified version of the UNOISE algorithm. DnoisE implements different merging procedures as options, and can incorporate codon entropy information either retrieved from the data or supplied by the user. In addition, the algorithm of DnoisE is parallelizable, greatly reducing runtimes on computer clusters. Our program also allows different input file formats, so it can be readily incorporated into existing metabarcoding pipelines.
format Online
Article
Text
id pubmed-8783565
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-87835652022-02-01 DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets Antich, Adrià Palacín, Creu Turon, Xavier Wangensteen, Owen S. PeerJ Biodiversity DNA metabarcoding is broadly used in biodiversity studies encompassing a wide range of organisms. Erroneous amplicons, generated during amplification and sequencing procedures, constitute one of the major sources of concern for the interpretation of metabarcoding results. Several denoising programs have been implemented to detect and eliminate these errors. However, almost all denoising software currently available has been designed to process non-coding ribosomal sequences, most notably prokaryotic 16S rDNA. The growing number of metabarcoding studies using coding markers such as COI or RuBisCO demands a re-assessment and calibration of denoising algorithms. Here we present DnoisE, the first denoising program designed to detect erroneous reads and merge them with the correct ones using information from the natural variability (entropy) associated to each codon position in coding barcodes. We have developed an open-source software using a modified version of the UNOISE algorithm. DnoisE implements different merging procedures as options, and can incorporate codon entropy information either retrieved from the data or supplied by the user. In addition, the algorithm of DnoisE is parallelizable, greatly reducing runtimes on computer clusters. Our program also allows different input file formats, so it can be readily incorporated into existing metabarcoding pipelines. PeerJ Inc. 2022-01-19 /pmc/articles/PMC8783565/ /pubmed/35111399 http://dx.doi.org/10.7717/peerj.12758 Text en ©2022 Antich et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Biodiversity
Antich, Adrià
Palacín, Creu
Turon, Xavier
Wangensteen, Owen S.
DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_full DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_fullStr DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_full_unstemmed DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_short DnoisE: distance denoising by entropy. An open-source parallelizable alternative for denoising sequence datasets
title_sort dnoise: distance denoising by entropy. an open-source parallelizable alternative for denoising sequence datasets
topic Biodiversity
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8783565/
https://www.ncbi.nlm.nih.gov/pubmed/35111399
http://dx.doi.org/10.7717/peerj.12758
work_keys_str_mv AT antichadria dnoisedistancedenoisingbyentropyanopensourceparallelizablealternativefordenoisingsequencedatasets
AT palacincreu dnoisedistancedenoisingbyentropyanopensourceparallelizablealternativefordenoisingsequencedatasets
AT turonxavier dnoisedistancedenoisingbyentropyanopensourceparallelizablealternativefordenoisingsequencedatasets
AT wangensteenowens dnoisedistancedenoisingbyentropyanopensourceparallelizablealternativefordenoisingsequencedatasets