Cargando…

Using random forest for reliable classification and cost-sensitive learning for medical diagnosis

BACKGROUND: Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Fan, Wang, Hua-zhen, Mi, Hong, Lin, Cheng-de, Cai, Wei-wen
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648734/ https://www.ncbi.nlm.nih.gov/pubmed/19208122 http://dx.doi.org/10.1186/1471-2105-10-S1-S22

_version_	1782164975677079552
author	Yang, Fan Wang, Hua-zhen Mi, Hong Lin, Cheng-de Cai, Wei-wen
author_facet	Yang, Fan Wang, Hua-zhen Mi, Hong Lin, Cheng-de Cai, Wei-wen
author_sort	Yang, Fan
collection	PubMed
description	BACKGROUND: Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further, the default assumption of equal misclassification costs is most likely violated in medical diagnosis. RESULTS: In this paper, we present a modified random forest classifier which is incorporated into the conformal predictor scheme. A conformal predictor is a transductive learning scheme, using Kolmogorov complexity to test the randomness of a particular sample with respect to the training sets. Our method show well-calibrated property that the performance can be set prior to classification and the accurate rate is exactly equal to the predefined confidence level. Further, to address the cost sensitive problem, we extend our method to a label-conditional predictor which takes into account different costs for misclassifications in different class and allows different confidence level to be specified for each class. Intensive experiments on benchmark datasets and real world applications show the resultant classifier is well-calibrated and able to control the specific risk of different class. CONCLUSION: The method of using RF outlier measure to design a nonconformity measure benefits the resultant predictor. Further, a label-conditional classifier is developed and turn to be an alternative approach to the cost sensitive learning problem that relies on label-wise predefined confidence level. The target of minimizing the risk of misclassification is achieved by specifying the different confidence level for different class.
format	Text
id	pubmed-2648734
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26487342009-03-03 Using random forest for reliable classification and cost-sensitive learning for medical diagnosis Yang, Fan Wang, Hua-zhen Mi, Hong Lin, Cheng-de Cai, Wei-wen BMC Bioinformatics Research BACKGROUND: Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further, the default assumption of equal misclassification costs is most likely violated in medical diagnosis. RESULTS: In this paper, we present a modified random forest classifier which is incorporated into the conformal predictor scheme. A conformal predictor is a transductive learning scheme, using Kolmogorov complexity to test the randomness of a particular sample with respect to the training sets. Our method show well-calibrated property that the performance can be set prior to classification and the accurate rate is exactly equal to the predefined confidence level. Further, to address the cost sensitive problem, we extend our method to a label-conditional predictor which takes into account different costs for misclassifications in different class and allows different confidence level to be specified for each class. Intensive experiments on benchmark datasets and real world applications show the resultant classifier is well-calibrated and able to control the specific risk of different class. CONCLUSION: The method of using RF outlier measure to design a nonconformity measure benefits the resultant predictor. Further, a label-conditional classifier is developed and turn to be an alternative approach to the cost sensitive learning problem that relies on label-wise predefined confidence level. The target of minimizing the risk of misclassification is achieved by specifying the different confidence level for different class. BioMed Central 2009-01-30 /pmc/articles/PMC2648734/ /pubmed/19208122 http://dx.doi.org/10.1186/1471-2105-10-S1-S22 Text en Copyright © 2009 Yang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Yang, Fan Wang, Hua-zhen Mi, Hong Lin, Cheng-de Cai, Wei-wen Using random forest for reliable classification and cost-sensitive learning for medical diagnosis
title	Using random forest for reliable classification and cost-sensitive learning for medical diagnosis
title_full	Using random forest for reliable classification and cost-sensitive learning for medical diagnosis
title_fullStr	Using random forest for reliable classification and cost-sensitive learning for medical diagnosis
title_full_unstemmed	Using random forest for reliable classification and cost-sensitive learning for medical diagnosis
title_short	Using random forest for reliable classification and cost-sensitive learning for medical diagnosis
title_sort	using random forest for reliable classification and cost-sensitive learning for medical diagnosis
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648734/ https://www.ncbi.nlm.nih.gov/pubmed/19208122 http://dx.doi.org/10.1186/1471-2105-10-S1-S22
work_keys_str_mv	AT yangfan usingrandomforestforreliableclassificationandcostsensitivelearningformedicaldiagnosis AT wanghuazhen usingrandomforestforreliableclassificationandcostsensitivelearningformedicaldiagnosis AT mihong usingrandomforestforreliableclassificationandcostsensitivelearningformedicaldiagnosis AT linchengde usingrandomforestforreliableclassificationandcostsensitivelearningformedicaldiagnosis AT caiweiwen usingrandomforestforreliableclassificationandcostsensitivelearningformedicaldiagnosis

Using random forest for reliable classification and cost-sensitive learning for medical diagnosis

Ejemplares similares