Cargando…

Enriching for correct prediction of biological processes using a combination of diverse classifiers

BACKGROUND: Machine learning models (classifiers) for classifying genes to biological processes each have their own unique characteristics in what genes can be classified and to what biological processes. No single learning model is qualitatively superior to any other model and overall precision for...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ko, Daijin, Windle, Brad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121646/ https://www.ncbi.nlm.nih.gov/pubmed/21605426 http://dx.doi.org/10.1186/1471-2105-12-189

_version_	1782206843006746624
author	Ko, Daijin Windle, Brad
author_facet	Ko, Daijin Windle, Brad
author_sort	Ko, Daijin
collection	PubMed
description	BACKGROUND: Machine learning models (classifiers) for classifying genes to biological processes each have their own unique characteristics in what genes can be classified and to what biological processes. No single learning model is qualitatively superior to any other model and overall precision for each model tends to be low. The classification results for each classifier can be complementary and synergistic suggesting the benefit of a combination of algorithms, but often the prediction probability outputs of various learning models are neither comparable nor compatible for combining. A means to compare outputs regardless of the model and data used and combine the results into an improved comprehensive model is needed. RESULTS: Gene expression patterns from NCI's panel of 60 cell lines were used to train a Random Forest, a Support Vector Machine and a Neural Network model, plus two over-sampled models for classifying genes to biological processes. Each model produced unique characteristics in the classification results. We introduce the Precision Index measure (PIN) from the maximum posterior probability that allows assessing, comparing and combining multiple classifiers. The class specific precision measure (PIC) is introduced and used to select a subset of predictions across all classes and all classifiers with high precision. We developed a single classifier that combines the PINs from these five models in prediction and found that the PIN Combined Classifier (PINCom) significantly increased the number of correctly predicted genes over any single classifier. The PINCom applied to test genes that were not used in training also showed substantial improvement over any single model. CONCLUSIONS: This paper introduces novel and effective ways of assessing predictions by their precision and recall plus a method that combines several machine learning models and capitalizes on synergy and complementation in class selection, resulting in higher precision and recall. Different machine learning models yielded incongruent results each of which were successfully combined into one superior model using the PIN measure we developed. Validation of the boosted predictions for gene functions showed the genes to be accurately predicted.
format	Online Article Text
id	pubmed-3121646
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31216462011-06-24 Enriching for correct prediction of biological processes using a combination of diverse classifiers Ko, Daijin Windle, Brad BMC Bioinformatics Research Article BACKGROUND: Machine learning models (classifiers) for classifying genes to biological processes each have their own unique characteristics in what genes can be classified and to what biological processes. No single learning model is qualitatively superior to any other model and overall precision for each model tends to be low. The classification results for each classifier can be complementary and synergistic suggesting the benefit of a combination of algorithms, but often the prediction probability outputs of various learning models are neither comparable nor compatible for combining. A means to compare outputs regardless of the model and data used and combine the results into an improved comprehensive model is needed. RESULTS: Gene expression patterns from NCI's panel of 60 cell lines were used to train a Random Forest, a Support Vector Machine and a Neural Network model, plus two over-sampled models for classifying genes to biological processes. Each model produced unique characteristics in the classification results. We introduce the Precision Index measure (PIN) from the maximum posterior probability that allows assessing, comparing and combining multiple classifiers. The class specific precision measure (PIC) is introduced and used to select a subset of predictions across all classes and all classifiers with high precision. We developed a single classifier that combines the PINs from these five models in prediction and found that the PIN Combined Classifier (PINCom) significantly increased the number of correctly predicted genes over any single classifier. The PINCom applied to test genes that were not used in training also showed substantial improvement over any single model. CONCLUSIONS: This paper introduces novel and effective ways of assessing predictions by their precision and recall plus a method that combines several machine learning models and capitalizes on synergy and complementation in class selection, resulting in higher precision and recall. Different machine learning models yielded incongruent results each of which were successfully combined into one superior model using the PIN measure we developed. Validation of the boosted predictions for gene functions showed the genes to be accurately predicted. BioMed Central 2011-05-23 /pmc/articles/PMC3121646/ /pubmed/21605426 http://dx.doi.org/10.1186/1471-2105-12-189 Text en Copyright ©2011 Ko and Windle; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Ko, Daijin Windle, Brad Enriching for correct prediction of biological processes using a combination of diverse classifiers
title	Enriching for correct prediction of biological processes using a combination of diverse classifiers
title_full	Enriching for correct prediction of biological processes using a combination of diverse classifiers
title_fullStr	Enriching for correct prediction of biological processes using a combination of diverse classifiers
title_full_unstemmed	Enriching for correct prediction of biological processes using a combination of diverse classifiers
title_short	Enriching for correct prediction of biological processes using a combination of diverse classifiers
title_sort	enriching for correct prediction of biological processes using a combination of diverse classifiers
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121646/ https://www.ncbi.nlm.nih.gov/pubmed/21605426 http://dx.doi.org/10.1186/1471-2105-12-189
work_keys_str_mv	AT kodaijin enrichingforcorrectpredictionofbiologicalprocessesusingacombinationofdiverseclassifiers AT windlebrad enrichingforcorrectpredictionofbiologicalprocessesusingacombinationofdiverseclassifiers

Enriching for correct prediction of biological processes using a combination of diverse classifiers

Ejemplares similares