Cargando…

Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads

High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be dev...

Descripción completa

Detalles Bibliográficos
Autores principales: Rosen, Gail L., Polikar, Robi, Caseiro, Diamantino A., Essinger, Steven D., Sokhansanj, Bahrad A.
Formato: Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085467/
https://www.ncbi.nlm.nih.gov/pubmed/21541181
http://dx.doi.org/10.1155/2011/495849
_version_ 1782202636420775936
author Rosen, Gail L.
Polikar, Robi
Caseiro, Diamantino A.
Essinger, Steven D.
Sokhansanj, Bahrad A.
author_facet Rosen, Gail L.
Polikar, Robi
Caseiro, Diamantino A.
Essinger, Steven D.
Sokhansanj, Bahrad A.
author_sort Rosen, Gail L.
collection PubMed
description High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for their ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate the performance of several algorithms on a real acid mine drainage dataset.
format Text
id pubmed-3085467
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-30854672011-05-03 Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads Rosen, Gail L. Polikar, Robi Caseiro, Diamantino A. Essinger, Steven D. Sokhansanj, Bahrad A. J Biomed Biotechnol Research Article High-throughput sequencing technologies enable metagenome profiling, simultaneous sequencing of multiple microbial species present within an environmental sample. Since metagenomic data includes sequence fragments (“reads”) from organisms that are absent from any database, new algorithms must be developed for the identification and annotation of novel sequence fragments. Homology-based techniques have been modified to detect novel species and genera, but, composition-based methods, have not been adapted. We develop a detection technique that can discriminate between “known” and “unknown” taxa, which can be used with composition-based methods, as well as a hybrid method. Unlike previous studies, we rigorously evaluate all algorithms for their ability to detect novel taxa. First, we show that the integration of a detector with a composition-based method performs significantly better than homology-based methods for the detection of novel species and genera, with best performance at finer taxonomic resolutions. Most importantly, we evaluate all the algorithms by introducing an “unknown” class and show that the modified version of PhymmBL has similar or better overall classification performance than the other modified algorithms, especially for the species-level and ultrashort reads. Finally, we evaluate the performance of several algorithms on a real acid mine drainage dataset. Hindawi Publishing Corporation 2011 2011-03-23 /pmc/articles/PMC3085467/ /pubmed/21541181 http://dx.doi.org/10.1155/2011/495849 Text en Copyright © 2011 Gail L. Rosen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Rosen, Gail L.
Polikar, Robi
Caseiro, Diamantino A.
Essinger, Steven D.
Sokhansanj, Bahrad A.
Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
title Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
title_full Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
title_fullStr Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
title_full_unstemmed Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
title_short Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads
title_sort discovering the unknown: improving detection of novel species and genera from short reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085467/
https://www.ncbi.nlm.nih.gov/pubmed/21541181
http://dx.doi.org/10.1155/2011/495849
work_keys_str_mv AT rosengaill discoveringtheunknownimprovingdetectionofnovelspeciesandgenerafromshortreads
AT polikarrobi discoveringtheunknownimprovingdetectionofnovelspeciesandgenerafromshortreads
AT caseirodiamantinoa discoveringtheunknownimprovingdetectionofnovelspeciesandgenerafromshortreads
AT essingerstevend discoveringtheunknownimprovingdetectionofnovelspeciesandgenerafromshortreads
AT sokhansanjbahrada discoveringtheunknownimprovingdetectionofnovelspeciesandgenerafromshortreads