Cargando…
Selection of microbial biomarkers with genetic algorithm and principal component analysis
BACKGROUND: Principal components analysis (PCA) is often used to find characteristic patterns associated with certain diseases by reducing variable numbers before a predictive model is built, particularly when some variables are correlated. Usually, the first two or three components from PCA are use...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6904994/ https://www.ncbi.nlm.nih.gov/pubmed/31823717 http://dx.doi.org/10.1186/s12859-019-3001-4 |
_version_ | 1783478089513172992 |
---|---|
author | Zhang, Ping West, Nicholas P. Chen, Pin-Yen Thang, Mike W. C. Price, Gareth Cripps, Allan W. Cox, Amanda J. |
author_facet | Zhang, Ping West, Nicholas P. Chen, Pin-Yen Thang, Mike W. C. Price, Gareth Cripps, Allan W. Cox, Amanda J. |
author_sort | Zhang, Ping |
collection | PubMed |
description | BACKGROUND: Principal components analysis (PCA) is often used to find characteristic patterns associated with certain diseases by reducing variable numbers before a predictive model is built, particularly when some variables are correlated. Usually, the first two or three components from PCA are used to determine whether individuals can be clustered into two classification groups based on pre-determined criteria: control and disease group. However, a combination of other components may exist which better distinguish diseased individuals from healthy controls. Genetic algorithms (GAs) can be useful and efficient for searching the best combination of variables to build a prediction model. This study aimed to develop a prediction model that combines PCA and a genetic algorithm (GA) for identifying sets of bacterial species associated with obesity and metabolic syndrome (Mets). RESULTS: The prediction models built using the combination of principal components (PCs) selected by GA were compared to the models built using the top PCs that explained the most variance in the sample and to models built with selected original variables. The advantages of combining PCA with GA were demonstrated. CONCLUSIONS: The proposed algorithm overcomes the limitation of PCA for data analysis. It offers a new way to build prediction models that may improve the prediction accuracy. The variables included in the PCs that were selected by GA can be combined with flexibility for potential clinical applications. The algorithm can be useful for many biological studies where high dimensional data are collected with highly correlated variables. |
format | Online Article Text |
id | pubmed-6904994 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69049942019-12-11 Selection of microbial biomarkers with genetic algorithm and principal component analysis Zhang, Ping West, Nicholas P. Chen, Pin-Yen Thang, Mike W. C. Price, Gareth Cripps, Allan W. Cox, Amanda J. BMC Bioinformatics Research BACKGROUND: Principal components analysis (PCA) is often used to find characteristic patterns associated with certain diseases by reducing variable numbers before a predictive model is built, particularly when some variables are correlated. Usually, the first two or three components from PCA are used to determine whether individuals can be clustered into two classification groups based on pre-determined criteria: control and disease group. However, a combination of other components may exist which better distinguish diseased individuals from healthy controls. Genetic algorithms (GAs) can be useful and efficient for searching the best combination of variables to build a prediction model. This study aimed to develop a prediction model that combines PCA and a genetic algorithm (GA) for identifying sets of bacterial species associated with obesity and metabolic syndrome (Mets). RESULTS: The prediction models built using the combination of principal components (PCs) selected by GA were compared to the models built using the top PCs that explained the most variance in the sample and to models built with selected original variables. The advantages of combining PCA with GA were demonstrated. CONCLUSIONS: The proposed algorithm overcomes the limitation of PCA for data analysis. It offers a new way to build prediction models that may improve the prediction accuracy. The variables included in the PCs that were selected by GA can be combined with flexibility for potential clinical applications. The algorithm can be useful for many biological studies where high dimensional data are collected with highly correlated variables. BioMed Central 2019-12-10 /pmc/articles/PMC6904994/ /pubmed/31823717 http://dx.doi.org/10.1186/s12859-019-3001-4 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Zhang, Ping West, Nicholas P. Chen, Pin-Yen Thang, Mike W. C. Price, Gareth Cripps, Allan W. Cox, Amanda J. Selection of microbial biomarkers with genetic algorithm and principal component analysis |
title | Selection of microbial biomarkers with genetic algorithm and principal component analysis |
title_full | Selection of microbial biomarkers with genetic algorithm and principal component analysis |
title_fullStr | Selection of microbial biomarkers with genetic algorithm and principal component analysis |
title_full_unstemmed | Selection of microbial biomarkers with genetic algorithm and principal component analysis |
title_short | Selection of microbial biomarkers with genetic algorithm and principal component analysis |
title_sort | selection of microbial biomarkers with genetic algorithm and principal component analysis |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6904994/ https://www.ncbi.nlm.nih.gov/pubmed/31823717 http://dx.doi.org/10.1186/s12859-019-3001-4 |
work_keys_str_mv | AT zhangping selectionofmicrobialbiomarkerswithgeneticalgorithmandprincipalcomponentanalysis AT westnicholasp selectionofmicrobialbiomarkerswithgeneticalgorithmandprincipalcomponentanalysis AT chenpinyen selectionofmicrobialbiomarkerswithgeneticalgorithmandprincipalcomponentanalysis AT thangmikewc selectionofmicrobialbiomarkerswithgeneticalgorithmandprincipalcomponentanalysis AT pricegareth selectionofmicrobialbiomarkerswithgeneticalgorithmandprincipalcomponentanalysis AT crippsallanw selectionofmicrobialbiomarkerswithgeneticalgorithmandprincipalcomponentanalysis AT coxamandaj selectionofmicrobialbiomarkerswithgeneticalgorithmandprincipalcomponentanalysis |