Cargando…

A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species

BACKGROUND: The rapid identification of Bacillus spores and bacterial identification are paramount because of their implications in food poisoning, pathogenesis and their use as potential biowarfare agents. Many automated analytical techniques such as Curie-point pyrolysis mass spectrometry (Py-MS)...

Descripción completa

Detalles Bibliográficos
Autores principales: Correa, Elon, Goodacre, Royston
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228543/
https://www.ncbi.nlm.nih.gov/pubmed/21269434
http://dx.doi.org/10.1186/1471-2105-12-33
_version_ 1782217830021726208
author Correa, Elon
Goodacre, Royston
author_facet Correa, Elon
Goodacre, Royston
author_sort Correa, Elon
collection PubMed
description BACKGROUND: The rapid identification of Bacillus spores and bacterial identification are paramount because of their implications in food poisoning, pathogenesis and their use as potential biowarfare agents. Many automated analytical techniques such as Curie-point pyrolysis mass spectrometry (Py-MS) have been used to identify bacterial spores giving use to large amounts of analytical data. This high number of features makes interpretation of the data extremely difficult We analysed Py-MS data from 36 different strains of aerobic endospore-forming bacteria encompassing seven different species. These bacteria were grown axenically on nutrient agar and vegetative biomass and spores were analyzed by Curie-point Py-MS. RESULTS: We develop a novel genetic algorithm-Bayesian network algorithm that accurately identifies sand selects a small subset of key relevant mass spectra (biomarkers) to be further analysed. Once identified, this subset of relevant biomarkers was then used to identify Bacillus spores successfully and to identify Bacillus species via a Bayesian network model specifically built for this reduced set of features. CONCLUSIONS: This final compact Bayesian network classification model is parsimonious, computationally fast to run and its graphical visualization allows easy interpretation of the probabilistic relationships among selected biomarkers. In addition, we compare the features selected by the genetic algorithm-Bayesian network approach with the features selected by partial least squares-discriminant analysis (PLS-DA). The classification accuracy results show that the set of features selected by the GA-BN is far superior to PLS-DA.
format Online
Article
Text
id pubmed-3228543
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32285432011-12-07 A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species Correa, Elon Goodacre, Royston BMC Bioinformatics Methodology Article BACKGROUND: The rapid identification of Bacillus spores and bacterial identification are paramount because of their implications in food poisoning, pathogenesis and their use as potential biowarfare agents. Many automated analytical techniques such as Curie-point pyrolysis mass spectrometry (Py-MS) have been used to identify bacterial spores giving use to large amounts of analytical data. This high number of features makes interpretation of the data extremely difficult We analysed Py-MS data from 36 different strains of aerobic endospore-forming bacteria encompassing seven different species. These bacteria were grown axenically on nutrient agar and vegetative biomass and spores were analyzed by Curie-point Py-MS. RESULTS: We develop a novel genetic algorithm-Bayesian network algorithm that accurately identifies sand selects a small subset of key relevant mass spectra (biomarkers) to be further analysed. Once identified, this subset of relevant biomarkers was then used to identify Bacillus spores successfully and to identify Bacillus species via a Bayesian network model specifically built for this reduced set of features. CONCLUSIONS: This final compact Bayesian network classification model is parsimonious, computationally fast to run and its graphical visualization allows easy interpretation of the probabilistic relationships among selected biomarkers. In addition, we compare the features selected by the genetic algorithm-Bayesian network approach with the features selected by partial least squares-discriminant analysis (PLS-DA). The classification accuracy results show that the set of features selected by the GA-BN is far superior to PLS-DA. BioMed Central 2011-01-26 /pmc/articles/PMC3228543/ /pubmed/21269434 http://dx.doi.org/10.1186/1471-2105-12-33 Text en Copyright ©2011 Correa and Goodacre; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Correa, Elon
Goodacre, Royston
A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species
title A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species
title_full A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species
title_fullStr A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species
title_full_unstemmed A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species
title_short A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species
title_sort genetic algorithm-bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of bacillus spores and classification of bacillus species
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228543/
https://www.ncbi.nlm.nih.gov/pubmed/21269434
http://dx.doi.org/10.1186/1471-2105-12-33
work_keys_str_mv AT correaelon ageneticalgorithmbayesiannetworkapproachfortheanalysisofmetabolomicsandspectroscopicdataapplicationtotherapididentificationofbacillussporesandclassificationofbacillusspecies
AT goodacreroyston ageneticalgorithmbayesiannetworkapproachfortheanalysisofmetabolomicsandspectroscopicdataapplicationtotherapididentificationofbacillussporesandclassificationofbacillusspecies
AT correaelon geneticalgorithmbayesiannetworkapproachfortheanalysisofmetabolomicsandspectroscopicdataapplicationtotherapididentificationofbacillussporesandclassificationofbacillusspecies
AT goodacreroyston geneticalgorithmbayesiannetworkapproachfortheanalysisofmetabolomicsandspectroscopicdataapplicationtotherapididentificationofbacillussporesandclassificationofbacillusspecies