Cargando…

Classification of genomic islands using decision trees and their ensemble algorithms

BACKGROUND: Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated feature...

Descripción completa

Detalles Bibliográficos
Autores principales: Che, Dongsheng, Hockenbury, Cory, Marmelstein, Robert, Rasheed, Khaled
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975412/
https://www.ncbi.nlm.nih.gov/pubmed/21047376
http://dx.doi.org/10.1186/1471-2164-11-S2-S1
_version_ 1782190943313592320
author Che, Dongsheng
Hockenbury, Cory
Marmelstein, Robert
Rasheed, Khaled
author_facet Che, Dongsheng
Hockenbury, Cory
Marmelstein, Robert
Rasheed, Khaled
author_sort Che, Dongsheng
collection PubMed
description BACKGROUND: Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory. RESULTS: In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy. CONCLUSIONS: We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.
format Text
id pubmed-2975412
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29754122010-11-09 Classification of genomic islands using decision trees and their ensemble algorithms Che, Dongsheng Hockenbury, Cory Marmelstein, Robert Rasheed, Khaled BMC Genomics Research BACKGROUND: Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory. RESULTS: In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy. CONCLUSIONS: We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/. BioMed Central 2010-11-02 /pmc/articles/PMC2975412/ /pubmed/21047376 http://dx.doi.org/10.1186/1471-2164-11-S2-S1 Text en Copyright ©2010 Che et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Che, Dongsheng
Hockenbury, Cory
Marmelstein, Robert
Rasheed, Khaled
Classification of genomic islands using decision trees and their ensemble algorithms
title Classification of genomic islands using decision trees and their ensemble algorithms
title_full Classification of genomic islands using decision trees and their ensemble algorithms
title_fullStr Classification of genomic islands using decision trees and their ensemble algorithms
title_full_unstemmed Classification of genomic islands using decision trees and their ensemble algorithms
title_short Classification of genomic islands using decision trees and their ensemble algorithms
title_sort classification of genomic islands using decision trees and their ensemble algorithms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2975412/
https://www.ncbi.nlm.nih.gov/pubmed/21047376
http://dx.doi.org/10.1186/1471-2164-11-S2-S1
work_keys_str_mv AT chedongsheng classificationofgenomicislandsusingdecisiontreesandtheirensemblealgorithms
AT hockenburycory classificationofgenomicislandsusingdecisiontreesandtheirensemblealgorithms
AT marmelsteinrobert classificationofgenomicislandsusingdecisiontreesandtheirensemblealgorithms
AT rasheedkhaled classificationofgenomicislandsusingdecisiontreesandtheirensemblealgorithms