Cargando…

Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier

As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the o...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Tao, Liu, Fu, Liu, Yun, Zou, Qing Yu, Zhang, Xiao, Wang, Ke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4309676/
https://www.ncbi.nlm.nih.gov/pubmed/25673967
http://dx.doi.org/10.4137/EBO.S20523
_version_ 1782354739425443840
author Hou, Tao
Liu, Fu
Liu, Yun
Zou, Qing Yu
Zhang, Xiao
Wang, Ke
author_facet Hou, Tao
Liu, Fu
Liu, Yun
Zou, Qing Yu
Zhang, Xiao
Wang, Ke
author_sort Hou, Tao
collection PubMed
description As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually cannot discriminate the training data from different classes accurately when the data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions, and some highly expressed genes, etc. The outliers, treated as noises, prohibit the development of classifiers with better prediction accuracy. To solve the problem, we present a robust supervised classifier, weighted support vector domain description (WSVDD), which can eliminate the interference from some outliers for training genomic data and then generate more accurate data domain descriptions for each taxonomic class. The experimental results demonstrate WSVDD is more robust than other classifiers for simulated Sanger and 454 reads with different outlier rates. In addition, in experiments performed on simulated metagenomes and real gut metagenomes, WSVDD also achieved better prediction accuracy than other classifiers.
format Online
Article
Text
id pubmed-4309676
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-43096762015-02-11 Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier Hou, Tao Liu, Fu Liu, Yun Zou, Qing Yu Zhang, Xiao Wang, Ke Evol Bioinform Online Methodology As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually cannot discriminate the training data from different classes accurately when the data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions, and some highly expressed genes, etc. The outliers, treated as noises, prohibit the development of classifiers with better prediction accuracy. To solve the problem, we present a robust supervised classifier, weighted support vector domain description (WSVDD), which can eliminate the interference from some outliers for training genomic data and then generate more accurate data domain descriptions for each taxonomic class. The experimental results demonstrate WSVDD is more robust than other classifiers for simulated Sanger and 454 reads with different outlier rates. In addition, in experiments performed on simulated metagenomes and real gut metagenomes, WSVDD also achieved better prediction accuracy than other classifiers. Libertas Academica 2015-01-26 /pmc/articles/PMC4309676/ /pubmed/25673967 http://dx.doi.org/10.4137/EBO.S20523 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Methodology
Hou, Tao
Liu, Fu
Liu, Yun
Zou, Qing Yu
Zhang, Xiao
Wang, Ke
Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
title Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
title_full Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
title_fullStr Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
title_full_unstemmed Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
title_short Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
title_sort classification of metagenomics data at lower taxonomic level using a robust supervised classifier
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4309676/
https://www.ncbi.nlm.nih.gov/pubmed/25673967
http://dx.doi.org/10.4137/EBO.S20523
work_keys_str_mv AT houtao classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier
AT liufu classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier
AT liuyun classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier
AT zouqingyu classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier
AT zhangxiao classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier
AT wangke classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier