Cargando…

Assembling Reads Improves Taxonomic Classification of Species

Most current approach to metagenomic classification employ short next generation sequencing (NGS) reads that are present in metagenomic samples to identify unique genomic regions. NGS reads, however, might not be long enough to differentiate similar genomes. This suggests a potential for using longe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tran, Quang, Phan, Vinhthuy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7465921/ https://www.ncbi.nlm.nih.gov/pubmed/32824429 http://dx.doi.org/10.3390/genes11080946

_version_	1783577693139238912
author	Tran, Quang Phan, Vinhthuy
author_facet	Tran, Quang Phan, Vinhthuy
author_sort	Tran, Quang
collection	PubMed
description	Most current approach to metagenomic classification employ short next generation sequencing (NGS) reads that are present in metagenomic samples to identify unique genomic regions. NGS reads, however, might not be long enough to differentiate similar genomes. This suggests a potential for using longer reads to improve classification performance. Presently, longer reads tend to have a higher rate of sequencing errors. Thus, given the pros and cons, it remains unclear which types of reads is better for metagenomic classification. We compared two taxonomic classification protocols: a traditional assembly-free protocol and a novel assembly-based protocol. The novel assembly-based protocol consists of assembling short-reads into longer reads, which will be subsequently classified by a traditional taxonomic classifier. We discovered that most classifiers made fewer predictions with longer reads and that they achieved higher classification performance on synthetic metagenomic data. Generally, we observed a significant increase in precision, while having similar recall rates. On real data, we observed similar characteristics that suggest that the classifiers might have similar performance of higher precision with similar recall with longer reads. We have shown a noticeable difference in performance between assembly-based and assembly-free taxonomic classification. This finding strongly suggests that classifying species in metagenomic environments can be achieved with higher overall performance simply by assembling short reads. Further, it also suggests that long-read technologies might be better for species classification.
format	Online Article Text
id	pubmed-7465921
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-74659212020-09-04 Assembling Reads Improves Taxonomic Classification of Species Tran, Quang Phan, Vinhthuy Genes (Basel) Article Most current approach to metagenomic classification employ short next generation sequencing (NGS) reads that are present in metagenomic samples to identify unique genomic regions. NGS reads, however, might not be long enough to differentiate similar genomes. This suggests a potential for using longer reads to improve classification performance. Presently, longer reads tend to have a higher rate of sequencing errors. Thus, given the pros and cons, it remains unclear which types of reads is better for metagenomic classification. We compared two taxonomic classification protocols: a traditional assembly-free protocol and a novel assembly-based protocol. The novel assembly-based protocol consists of assembling short-reads into longer reads, which will be subsequently classified by a traditional taxonomic classifier. We discovered that most classifiers made fewer predictions with longer reads and that they achieved higher classification performance on synthetic metagenomic data. Generally, we observed a significant increase in precision, while having similar recall rates. On real data, we observed similar characteristics that suggest that the classifiers might have similar performance of higher precision with similar recall with longer reads. We have shown a noticeable difference in performance between assembly-based and assembly-free taxonomic classification. This finding strongly suggests that classifying species in metagenomic environments can be achieved with higher overall performance simply by assembling short reads. Further, it also suggests that long-read technologies might be better for species classification. MDPI 2020-08-17 /pmc/articles/PMC7465921/ /pubmed/32824429 http://dx.doi.org/10.3390/genes11080946 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Tran, Quang Phan, Vinhthuy Assembling Reads Improves Taxonomic Classification of Species
title	Assembling Reads Improves Taxonomic Classification of Species
title_full	Assembling Reads Improves Taxonomic Classification of Species
title_fullStr	Assembling Reads Improves Taxonomic Classification of Species
title_full_unstemmed	Assembling Reads Improves Taxonomic Classification of Species
title_short	Assembling Reads Improves Taxonomic Classification of Species
title_sort	assembling reads improves taxonomic classification of species
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7465921/ https://www.ncbi.nlm.nih.gov/pubmed/32824429 http://dx.doi.org/10.3390/genes11080946
work_keys_str_mv	AT tranquang assemblingreadsimprovestaxonomicclassificationofspecies AT phanvinhthuy assemblingreadsimprovestaxonomicclassificationofspecies

Assembling Reads Improves Taxonomic Classification of Species

Ejemplares similares