Cargando…

On the use of sequence-quality information in OTU clustering

BACKGROUND: High-throughput sequencing has become an essential technology in life science research. Despite continuous improvements in technology, the produced sequences are still not entirely accurate. Consequently, the sequences are usually equipped with error probabilities. The quality informatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Müller, Robert, Nebel, Markus
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8375510/ https://www.ncbi.nlm.nih.gov/pubmed/34458017 http://dx.doi.org/10.7717/peerj.11717

_version_	1783740327940587520
author	Müller, Robert Nebel, Markus
author_facet	Müller, Robert Nebel, Markus
author_sort	Müller, Robert
collection	PubMed
description	BACKGROUND: High-throughput sequencing has become an essential technology in life science research. Despite continuous improvements in technology, the produced sequences are still not entirely accurate. Consequently, the sequences are usually equipped with error probabilities. The quality information is already employed to find better solutions to a number of bioinformatics problems (e.g. read mapping). Data processing pipelines benefit in particular (especially when incorporating the quality information early), since enhanced outcomes of one step can improve all subsequent ones. Preprocessing steps, thus, quite regularly consider the sequence quality to fix errors or discard low-quality data. Other steps, however, like clustering sequences into operational taxonomic units (OTUs), a common task in the analysis of microbial communities, are typically performed without making use of the available quality information. RESULTS: In this paper, we present quality-aware clustering methods inspired by quality-weighted alignments and model-based denoising, and explore their applicability to OTU clustering. We implemented the quality-aware methods in a revised version of our de novo clustering tool GeFaST and evaluated their clustering quality and performance on mock-community data sets. Quality-weighted alignments were able to improve the clustering quality of GeFaST by up to 10%. The examination of the model-supported methods provided a more diverse picture, hinting at a narrower applicability, but they were able to attain similar improvements. Considering the quality information enlarged both runtime and memory consumption, even though the increase of the former depended heavily on the applied method and clustering threshold. CONCLUSIONS: The quality-aware methods expand the iterative, de novo clustering approach by new clustering and cluster refinement methods. Our results indicate that OTU clustering constitutes yet another analysis step benefiting from the integration of quality information. Beyond the shown potential, the quality-aware methods offer a range of opportunities for fine-tuning and further extensions.
format	Online Article Text
id	pubmed-8375510
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-83755102021-08-27 On the use of sequence-quality information in OTU clustering Müller, Robert Nebel, Markus PeerJ Bioinformatics BACKGROUND: High-throughput sequencing has become an essential technology in life science research. Despite continuous improvements in technology, the produced sequences are still not entirely accurate. Consequently, the sequences are usually equipped with error probabilities. The quality information is already employed to find better solutions to a number of bioinformatics problems (e.g. read mapping). Data processing pipelines benefit in particular (especially when incorporating the quality information early), since enhanced outcomes of one step can improve all subsequent ones. Preprocessing steps, thus, quite regularly consider the sequence quality to fix errors or discard low-quality data. Other steps, however, like clustering sequences into operational taxonomic units (OTUs), a common task in the analysis of microbial communities, are typically performed without making use of the available quality information. RESULTS: In this paper, we present quality-aware clustering methods inspired by quality-weighted alignments and model-based denoising, and explore their applicability to OTU clustering. We implemented the quality-aware methods in a revised version of our de novo clustering tool GeFaST and evaluated their clustering quality and performance on mock-community data sets. Quality-weighted alignments were able to improve the clustering quality of GeFaST by up to 10%. The examination of the model-supported methods provided a more diverse picture, hinting at a narrower applicability, but they were able to attain similar improvements. Considering the quality information enlarged both runtime and memory consumption, even though the increase of the former depended heavily on the applied method and clustering threshold. CONCLUSIONS: The quality-aware methods expand the iterative, de novo clustering approach by new clustering and cluster refinement methods. Our results indicate that OTU clustering constitutes yet another analysis step benefiting from the integration of quality information. Beyond the shown potential, the quality-aware methods offer a range of opportunities for fine-tuning and further extensions. PeerJ Inc. 2021-08-16 /pmc/articles/PMC8375510/ /pubmed/34458017 http://dx.doi.org/10.7717/peerj.11717 Text en © 2021 Müller and Nebel https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Müller, Robert Nebel, Markus On the use of sequence-quality information in OTU clustering
title	On the use of sequence-quality information in OTU clustering
title_full	On the use of sequence-quality information in OTU clustering
title_fullStr	On the use of sequence-quality information in OTU clustering
title_full_unstemmed	On the use of sequence-quality information in OTU clustering
title_short	On the use of sequence-quality information in OTU clustering
title_sort	on the use of sequence-quality information in otu clustering
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8375510/ https://www.ncbi.nlm.nih.gov/pubmed/34458017 http://dx.doi.org/10.7717/peerj.11717
work_keys_str_mv	AT mullerrobert ontheuseofsequencequalityinformationinotuclustering AT nebelmarkus ontheuseofsequencequalityinformationinotuclustering

On the use of sequence-quality information in OTU clustering

Ejemplares similares