Cargando…

DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection

BACKGROUND: Continual progress in next-generation sequencing allows for generating increasingly large metagenomes which are over time or space. Comparing and classifying the metagenomes with different microbial communities is critical. Alignment-free supervised classification is important for discri...

Descripción completa

Detalles Bibliográficos
Autores principales: Ding, Xiao, Cheng, Fudong, Cao, Changchang, Sun, Xiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4596415/
https://www.ncbi.nlm.nih.gov/pubmed/26446672
http://dx.doi.org/10.1186/s12859-015-0753-3
_version_ 1782393760795066368
author Ding, Xiao
Cheng, Fudong
Cao, Changchang
Sun, Xiao
author_facet Ding, Xiao
Cheng, Fudong
Cao, Changchang
Sun, Xiao
author_sort Ding, Xiao
collection PubMed
description BACKGROUND: Continual progress in next-generation sequencing allows for generating increasingly large metagenomes which are over time or space. Comparing and classifying the metagenomes with different microbial communities is critical. Alignment-free supervised classification is important for discriminating between the multifarious components of metagenomic samples, because it can be accomplished independently of known microbial genomes. RESULTS: We propose an alignment-free supervised metagenomic classification method called DectICO. The intrinsic correlation of oligonucleotides provides the feature set, which is selected dynamically using a kernel partial least squares algorithm, and the feature matrices extracted with this set are sequentially employed to train classifiers by support vector machine (SVM). We evaluated the classification performance of DectICO on three actual metagenomic sequencing datasets, two containing deep sequencing metagenomes and one of low coverage. Validation results show that DectICO is powerful, performs well based on long oligonucleotides (i.e., 6-mer to 8-mer), and is more stable and generalized than a sequence-composition-based method. The classifiers trained by our method are more accurate than non-dynamic feature selection methods and a recently published recursive-SVM-based classification approach. CONCLUSIONS: The alignment-free supervised classification method DectICO can accurately classify metagenomic samples without dependence on known microbial genomes. Selecting the ICO dynamically offers better stability and generality compared with sequence-composition-based classification algorithms. Our proposed method provides new insights in metagenomic sample classification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0753-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4596415
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45964152015-10-08 DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection Ding, Xiao Cheng, Fudong Cao, Changchang Sun, Xiao BMC Bioinformatics Methodology Article BACKGROUND: Continual progress in next-generation sequencing allows for generating increasingly large metagenomes which are over time or space. Comparing and classifying the metagenomes with different microbial communities is critical. Alignment-free supervised classification is important for discriminating between the multifarious components of metagenomic samples, because it can be accomplished independently of known microbial genomes. RESULTS: We propose an alignment-free supervised metagenomic classification method called DectICO. The intrinsic correlation of oligonucleotides provides the feature set, which is selected dynamically using a kernel partial least squares algorithm, and the feature matrices extracted with this set are sequentially employed to train classifiers by support vector machine (SVM). We evaluated the classification performance of DectICO on three actual metagenomic sequencing datasets, two containing deep sequencing metagenomes and one of low coverage. Validation results show that DectICO is powerful, performs well based on long oligonucleotides (i.e., 6-mer to 8-mer), and is more stable and generalized than a sequence-composition-based method. The classifiers trained by our method are more accurate than non-dynamic feature selection methods and a recently published recursive-SVM-based classification approach. CONCLUSIONS: The alignment-free supervised classification method DectICO can accurately classify metagenomic samples without dependence on known microbial genomes. Selecting the ICO dynamically offers better stability and generality compared with sequence-composition-based classification algorithms. Our proposed method provides new insights in metagenomic sample classification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0753-3) contains supplementary material, which is available to authorized users. BioMed Central 2015-10-07 /pmc/articles/PMC4596415/ /pubmed/26446672 http://dx.doi.org/10.1186/s12859-015-0753-3 Text en © Ding et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Ding, Xiao
Cheng, Fudong
Cao, Changchang
Sun, Xiao
DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
title DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
title_full DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
title_fullStr DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
title_full_unstemmed DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
title_short DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
title_sort dectico: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4596415/
https://www.ncbi.nlm.nih.gov/pubmed/26446672
http://dx.doi.org/10.1186/s12859-015-0753-3
work_keys_str_mv AT dingxiao decticoanalignmentfreesupervisedmetagenomicclassificationmethodbasedonfeatureextractionanddynamicselection
AT chengfudong decticoanalignmentfreesupervisedmetagenomicclassificationmethodbasedonfeatureextractionanddynamicselection
AT caochangchang decticoanalignmentfreesupervisedmetagenomicclassificationmethodbasedonfeatureextractionanddynamicselection
AT sunxiao decticoanalignmentfreesupervisedmetagenomicclassificationmethodbasedonfeatureextractionanddynamicselection