Cargando…

Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data

Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analys...

Descripción completa

Detalles Bibliográficos
Autores principales:	Glaab, Enrico, Bacardit, Jaume, Garibaldi, Jonathan M., Krasnogor, Natalio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394775/ https://www.ncbi.nlm.nih.gov/pubmed/22808075 http://dx.doi.org/10.1371/journal.pone.0039932

_version_	1782237895836303360
author	Glaab, Enrico Bacardit, Jaume Garibaldi, Jonathan M. Krasnogor, Natalio
author_facet	Glaab, Enrico Bacardit, Jaume Garibaldi, Jonathan M. Krasnogor, Natalio
author_sort	Glaab, Enrico
collection	PubMed
description	Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL’s classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.
format	Online Article Text
id	pubmed-3394775
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-33947752012-07-17 Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data Glaab, Enrico Bacardit, Jaume Garibaldi, Jonathan M. Krasnogor, Natalio PLoS One Research Article Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL’s classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes. Public Library of Science 2012-07-11 /pmc/articles/PMC3394775/ /pubmed/22808075 http://dx.doi.org/10.1371/journal.pone.0039932 Text en Glaab et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Glaab, Enrico Bacardit, Jaume Garibaldi, Jonathan M. Krasnogor, Natalio Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data
title	Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data
title_full	Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data
title_fullStr	Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data
title_full_unstemmed	Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data
title_short	Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data
title_sort	using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394775/ https://www.ncbi.nlm.nih.gov/pubmed/22808075 http://dx.doi.org/10.1371/journal.pone.0039932
work_keys_str_mv	AT glaabenrico usingrulebasedmachinelearningforcandidatediseasegeneprioritizationandsampleclassificationofcancergeneexpressiondata AT bacarditjaume usingrulebasedmachinelearningforcandidatediseasegeneprioritizationandsampleclassificationofcancergeneexpressiondata AT garibaldijonathanm usingrulebasedmachinelearningforcandidatediseasegeneprioritizationandsampleclassificationofcancergeneexpressiondata AT krasnogornatalio usingrulebasedmachinelearningforcandidatediseasegeneprioritizationandsampleclassificationofcancergeneexpressiondata

Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data

Ejemplares similares