Cargando…
Alignment-free supervised classification of metagenomes by recursive SVM
BACKGROUND: Comparison and classification of metagenome samples is one of the major tasks in the study of microbial communities of natural environments or niches on human bodies. Bioinformatics methods play important roles on this task, including 16S rRNA gene analysis and some alignment-based or al...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3849074/ https://www.ncbi.nlm.nih.gov/pubmed/24053649 http://dx.doi.org/10.1186/1471-2164-14-641 |
_version_ | 1782293873089839104 |
---|---|
author | Cui, Hongfei Zhang, Xuegong |
author_facet | Cui, Hongfei Zhang, Xuegong |
author_sort | Cui, Hongfei |
collection | PubMed |
description | BACKGROUND: Comparison and classification of metagenome samples is one of the major tasks in the study of microbial communities of natural environments or niches on human bodies. Bioinformatics methods play important roles on this task, including 16S rRNA gene analysis and some alignment-based or alignment-free methods on metagenomic data. Alignment-free methods have the advantage of not depending on known genome annotations and therefore have high potential in studying complicated microbiomes. However, the existing alignment-free methods are all based on unsupervised learning strategy (e.g., PCA or hierarchical clustering). These types of methods are powerful in revealing major similarities and grouping relations between microbiome samples, but cannot be applied for discriminating predefined classes of interest which might not be the dominating assortment in the data. Supervised classification is needed in the latter scenario, with the goal of classifying samples into predefined classes and finding the features that can discriminate the classes. The effectiveness of supervised classification with alignment-based features on metagenomic data have been shown in some recent studies. The application of alignment-free supervised classification methods on metagenome data has not been well explored yet. RESULTS: We developed a method for this task using k-tuple frequencies as features counted directly from metagenome short reads and the R-SVM (Recursive SVM) for feature selection and classification. We tested our method on a simulation dataset, a real dataset composed of several known genomes, and a real metagenome NGS short reads dataset. Experiments on simulated data showed that the method can classify the classes almost perfectly and can recover major sequence signatures that distinguish the two classes. On the real human gut metagenome data, the method can discriminate samples of inflammatory bowel disease (IBD) patients from control samples with high accuracy, which cannot be separated when comparing the samples with unsupervised clustering approaches. CONCLUSIONS: The proposed alignment-free supervised classification method can perform well in discriminating of metagenomic samples of predefined classes and in selecting characteristic sequence features for the discrimination. This study shows as an example on the feasibility of using metagenome sequence features of microbiomes on human bodies to study specific human health conditions using supervised machine learning methods. |
format | Online Article Text |
id | pubmed-3849074 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38490742013-12-07 Alignment-free supervised classification of metagenomes by recursive SVM Cui, Hongfei Zhang, Xuegong BMC Genomics Research Article BACKGROUND: Comparison and classification of metagenome samples is one of the major tasks in the study of microbial communities of natural environments or niches on human bodies. Bioinformatics methods play important roles on this task, including 16S rRNA gene analysis and some alignment-based or alignment-free methods on metagenomic data. Alignment-free methods have the advantage of not depending on known genome annotations and therefore have high potential in studying complicated microbiomes. However, the existing alignment-free methods are all based on unsupervised learning strategy (e.g., PCA or hierarchical clustering). These types of methods are powerful in revealing major similarities and grouping relations between microbiome samples, but cannot be applied for discriminating predefined classes of interest which might not be the dominating assortment in the data. Supervised classification is needed in the latter scenario, with the goal of classifying samples into predefined classes and finding the features that can discriminate the classes. The effectiveness of supervised classification with alignment-based features on metagenomic data have been shown in some recent studies. The application of alignment-free supervised classification methods on metagenome data has not been well explored yet. RESULTS: We developed a method for this task using k-tuple frequencies as features counted directly from metagenome short reads and the R-SVM (Recursive SVM) for feature selection and classification. We tested our method on a simulation dataset, a real dataset composed of several known genomes, and a real metagenome NGS short reads dataset. Experiments on simulated data showed that the method can classify the classes almost perfectly and can recover major sequence signatures that distinguish the two classes. On the real human gut metagenome data, the method can discriminate samples of inflammatory bowel disease (IBD) patients from control samples with high accuracy, which cannot be separated when comparing the samples with unsupervised clustering approaches. CONCLUSIONS: The proposed alignment-free supervised classification method can perform well in discriminating of metagenomic samples of predefined classes and in selecting characteristic sequence features for the discrimination. This study shows as an example on the feasibility of using metagenome sequence features of microbiomes on human bodies to study specific human health conditions using supervised machine learning methods. BioMed Central 2013-09-22 /pmc/articles/PMC3849074/ /pubmed/24053649 http://dx.doi.org/10.1186/1471-2164-14-641 Text en Copyright © 2013 Cui and Zhang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Cui, Hongfei Zhang, Xuegong Alignment-free supervised classification of metagenomes by recursive SVM |
title | Alignment-free supervised classification of metagenomes by recursive SVM |
title_full | Alignment-free supervised classification of metagenomes by recursive SVM |
title_fullStr | Alignment-free supervised classification of metagenomes by recursive SVM |
title_full_unstemmed | Alignment-free supervised classification of metagenomes by recursive SVM |
title_short | Alignment-free supervised classification of metagenomes by recursive SVM |
title_sort | alignment-free supervised classification of metagenomes by recursive svm |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3849074/ https://www.ncbi.nlm.nih.gov/pubmed/24053649 http://dx.doi.org/10.1186/1471-2164-14-641 |
work_keys_str_mv | AT cuihongfei alignmentfreesupervisedclassificationofmetagenomesbyrecursivesvm AT zhangxuegong alignmentfreesupervisedclassificationofmetagenomesbyrecursivesvm |