Cargando…

Alignment-free supervised classification of metagenomes by recursive SVM

BACKGROUND: Comparison and classification of metagenome samples is one of the major tasks in the study of microbial communities of natural environments or niches on human bodies. Bioinformatics methods play important roles on this task, including 16S rRNA gene analysis and some alignment-based or al...

Descripción completa

Detalles Bibliográficos
Autores principales: Cui, Hongfei, Zhang, Xuegong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3849074/
https://www.ncbi.nlm.nih.gov/pubmed/24053649
http://dx.doi.org/10.1186/1471-2164-14-641
_version_ 1782293873089839104
author Cui, Hongfei
Zhang, Xuegong
author_facet Cui, Hongfei
Zhang, Xuegong
author_sort Cui, Hongfei
collection PubMed
description BACKGROUND: Comparison and classification of metagenome samples is one of the major tasks in the study of microbial communities of natural environments or niches on human bodies. Bioinformatics methods play important roles on this task, including 16S rRNA gene analysis and some alignment-based or alignment-free methods on metagenomic data. Alignment-free methods have the advantage of not depending on known genome annotations and therefore have high potential in studying complicated microbiomes. However, the existing alignment-free methods are all based on unsupervised learning strategy (e.g., PCA or hierarchical clustering). These types of methods are powerful in revealing major similarities and grouping relations between microbiome samples, but cannot be applied for discriminating predefined classes of interest which might not be the dominating assortment in the data. Supervised classification is needed in the latter scenario, with the goal of classifying samples into predefined classes and finding the features that can discriminate the classes. The effectiveness of supervised classification with alignment-based features on metagenomic data have been shown in some recent studies. The application of alignment-free supervised classification methods on metagenome data has not been well explored yet. RESULTS: We developed a method for this task using k-tuple frequencies as features counted directly from metagenome short reads and the R-SVM (Recursive SVM) for feature selection and classification. We tested our method on a simulation dataset, a real dataset composed of several known genomes, and a real metagenome NGS short reads dataset. Experiments on simulated data showed that the method can classify the classes almost perfectly and can recover major sequence signatures that distinguish the two classes. On the real human gut metagenome data, the method can discriminate samples of inflammatory bowel disease (IBD) patients from control samples with high accuracy, which cannot be separated when comparing the samples with unsupervised clustering approaches. CONCLUSIONS: The proposed alignment-free supervised classification method can perform well in discriminating of metagenomic samples of predefined classes and in selecting characteristic sequence features for the discrimination. This study shows as an example on the feasibility of using metagenome sequence features of microbiomes on human bodies to study specific human health conditions using supervised machine learning methods.
format Online
Article
Text
id pubmed-3849074
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38490742013-12-07 Alignment-free supervised classification of metagenomes by recursive SVM Cui, Hongfei Zhang, Xuegong BMC Genomics Research Article BACKGROUND: Comparison and classification of metagenome samples is one of the major tasks in the study of microbial communities of natural environments or niches on human bodies. Bioinformatics methods play important roles on this task, including 16S rRNA gene analysis and some alignment-based or alignment-free methods on metagenomic data. Alignment-free methods have the advantage of not depending on known genome annotations and therefore have high potential in studying complicated microbiomes. However, the existing alignment-free methods are all based on unsupervised learning strategy (e.g., PCA or hierarchical clustering). These types of methods are powerful in revealing major similarities and grouping relations between microbiome samples, but cannot be applied for discriminating predefined classes of interest which might not be the dominating assortment in the data. Supervised classification is needed in the latter scenario, with the goal of classifying samples into predefined classes and finding the features that can discriminate the classes. The effectiveness of supervised classification with alignment-based features on metagenomic data have been shown in some recent studies. The application of alignment-free supervised classification methods on metagenome data has not been well explored yet. RESULTS: We developed a method for this task using k-tuple frequencies as features counted directly from metagenome short reads and the R-SVM (Recursive SVM) for feature selection and classification. We tested our method on a simulation dataset, a real dataset composed of several known genomes, and a real metagenome NGS short reads dataset. Experiments on simulated data showed that the method can classify the classes almost perfectly and can recover major sequence signatures that distinguish the two classes. On the real human gut metagenome data, the method can discriminate samples of inflammatory bowel disease (IBD) patients from control samples with high accuracy, which cannot be separated when comparing the samples with unsupervised clustering approaches. CONCLUSIONS: The proposed alignment-free supervised classification method can perform well in discriminating of metagenomic samples of predefined classes and in selecting characteristic sequence features for the discrimination. This study shows as an example on the feasibility of using metagenome sequence features of microbiomes on human bodies to study specific human health conditions using supervised machine learning methods. BioMed Central 2013-09-22 /pmc/articles/PMC3849074/ /pubmed/24053649 http://dx.doi.org/10.1186/1471-2164-14-641 Text en Copyright © 2013 Cui and Zhang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Cui, Hongfei
Zhang, Xuegong
Alignment-free supervised classification of metagenomes by recursive SVM
title Alignment-free supervised classification of metagenomes by recursive SVM
title_full Alignment-free supervised classification of metagenomes by recursive SVM
title_fullStr Alignment-free supervised classification of metagenomes by recursive SVM
title_full_unstemmed Alignment-free supervised classification of metagenomes by recursive SVM
title_short Alignment-free supervised classification of metagenomes by recursive SVM
title_sort alignment-free supervised classification of metagenomes by recursive svm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3849074/
https://www.ncbi.nlm.nih.gov/pubmed/24053649
http://dx.doi.org/10.1186/1471-2164-14-641
work_keys_str_mv AT cuihongfei alignmentfreesupervisedclassificationofmetagenomesbyrecursivesvm
AT zhangxuegong alignmentfreesupervisedclassificationofmetagenomesbyrecursivesvm