Cargando…
Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the o...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4309676/ https://www.ncbi.nlm.nih.gov/pubmed/25673967 http://dx.doi.org/10.4137/EBO.S20523 |
_version_ | 1782354739425443840 |
---|---|
author | Hou, Tao Liu, Fu Liu, Yun Zou, Qing Yu Zhang, Xiao Wang, Ke |
author_facet | Hou, Tao Liu, Fu Liu, Yun Zou, Qing Yu Zhang, Xiao Wang, Ke |
author_sort | Hou, Tao |
collection | PubMed |
description | As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually cannot discriminate the training data from different classes accurately when the data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions, and some highly expressed genes, etc. The outliers, treated as noises, prohibit the development of classifiers with better prediction accuracy. To solve the problem, we present a robust supervised classifier, weighted support vector domain description (WSVDD), which can eliminate the interference from some outliers for training genomic data and then generate more accurate data domain descriptions for each taxonomic class. The experimental results demonstrate WSVDD is more robust than other classifiers for simulated Sanger and 454 reads with different outlier rates. In addition, in experiments performed on simulated metagenomes and real gut metagenomes, WSVDD also achieved better prediction accuracy than other classifiers. |
format | Online Article Text |
id | pubmed-4309676 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-43096762015-02-11 Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier Hou, Tao Liu, Fu Liu, Yun Zou, Qing Yu Zhang, Xiao Wang, Ke Evol Bioinform Online Methodology As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually cannot discriminate the training data from different classes accurately when the data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions, and some highly expressed genes, etc. The outliers, treated as noises, prohibit the development of classifiers with better prediction accuracy. To solve the problem, we present a robust supervised classifier, weighted support vector domain description (WSVDD), which can eliminate the interference from some outliers for training genomic data and then generate more accurate data domain descriptions for each taxonomic class. The experimental results demonstrate WSVDD is more robust than other classifiers for simulated Sanger and 454 reads with different outlier rates. In addition, in experiments performed on simulated metagenomes and real gut metagenomes, WSVDD also achieved better prediction accuracy than other classifiers. Libertas Academica 2015-01-26 /pmc/articles/PMC4309676/ /pubmed/25673967 http://dx.doi.org/10.4137/EBO.S20523 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License. |
spellingShingle | Methodology Hou, Tao Liu, Fu Liu, Yun Zou, Qing Yu Zhang, Xiao Wang, Ke Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier |
title | Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier |
title_full | Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier |
title_fullStr | Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier |
title_full_unstemmed | Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier |
title_short | Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier |
title_sort | classification of metagenomics data at lower taxonomic level using a robust supervised classifier |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4309676/ https://www.ncbi.nlm.nih.gov/pubmed/25673967 http://dx.doi.org/10.4137/EBO.S20523 |
work_keys_str_mv | AT houtao classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier AT liufu classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier AT liuyun classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier AT zouqingyu classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier AT zhangxiao classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier AT wangke classificationofmetagenomicsdataatlowertaxonomiclevelusingarobustsupervisedclassifier |