Cargando…

Learning to improve medical decision making from imbalanced data without a priori cost

BACKGROUND: In a medical data set, data are commonly composed of a minority (positive or abnormal) group and a majority (negative or normal) group and the cost of misclassifying a minority sample as a majority sample is highly expensive. This is the so-called imbalanced classification problem. The t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wan, Xiang, Liu, Jiming, Cheung, William K, Tong, Tiejun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4261533/ https://www.ncbi.nlm.nih.gov/pubmed/25480146 http://dx.doi.org/10.1186/s12911-014-0111-9

_version_	1782348286620860416
author	Wan, Xiang Liu, Jiming Cheung, William K Tong, Tiejun
author_facet	Wan, Xiang Liu, Jiming Cheung, William K Tong, Tiejun
author_sort	Wan, Xiang
collection	PubMed
description	BACKGROUND: In a medical data set, data are commonly composed of a minority (positive or abnormal) group and a majority (negative or normal) group and the cost of misclassifying a minority sample as a majority sample is highly expensive. This is the so-called imbalanced classification problem. The traditional classification functions can be seriously affected by the skewed class distribution in the data. To deal with this problem, people often use a priori cost to adjust the learning process in the pursuit of optimal classification function. However, this priori cost is often unknown and hard to estimate in medical decision making. METHODS: In this paper, we propose a new learning method, named RankCost, to classify imbalanced medical data without using a priori cost. Instead of focusing on improving the class-prediction accuracy, RankCost is to maximize the difference between the minority class and the majority class by using a scoring function, which translates the imbalanced classification problem into a partial ranking problem. The scoring function is learned via a non-parametric boosting algorithm. RESULTS: We compare RankCost to several representative approaches on four medical data sets varying in size, imbalanced ratio, and dimension. The experimental results demonstrate that unlike the currently available methods that often perform unevenly with different priori costs, RankCost shows comparable performance in a consistent manner. CONCLUSIONS: It is a challenging task to learn an effective classification model based on imbalanced data in medical data analysis. The traditional approaches often use a priori cost to adjust the learning of the classification function. This work presents a novel approach, namely RankCost, for learning from medical imbalanced data sets without using a priori cost. The experimental results indicate that RankCost performs very well in imbalanced data classification and can be a useful method in real-world applications of medical decision making.
format	Online Article Text
id	pubmed-4261533
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42615332014-12-10 Learning to improve medical decision making from imbalanced data without a priori cost Wan, Xiang Liu, Jiming Cheung, William K Tong, Tiejun BMC Med Inform Decis Mak Research Article BACKGROUND: In a medical data set, data are commonly composed of a minority (positive or abnormal) group and a majority (negative or normal) group and the cost of misclassifying a minority sample as a majority sample is highly expensive. This is the so-called imbalanced classification problem. The traditional classification functions can be seriously affected by the skewed class distribution in the data. To deal with this problem, people often use a priori cost to adjust the learning process in the pursuit of optimal classification function. However, this priori cost is often unknown and hard to estimate in medical decision making. METHODS: In this paper, we propose a new learning method, named RankCost, to classify imbalanced medical data without using a priori cost. Instead of focusing on improving the class-prediction accuracy, RankCost is to maximize the difference between the minority class and the majority class by using a scoring function, which translates the imbalanced classification problem into a partial ranking problem. The scoring function is learned via a non-parametric boosting algorithm. RESULTS: We compare RankCost to several representative approaches on four medical data sets varying in size, imbalanced ratio, and dimension. The experimental results demonstrate that unlike the currently available methods that often perform unevenly with different priori costs, RankCost shows comparable performance in a consistent manner. CONCLUSIONS: It is a challenging task to learn an effective classification model based on imbalanced data in medical data analysis. The traditional approaches often use a priori cost to adjust the learning of the classification function. This work presents a novel approach, namely RankCost, for learning from medical imbalanced data sets without using a priori cost. The experimental results indicate that RankCost performs very well in imbalanced data classification and can be a useful method in real-world applications of medical decision making. BioMed Central 2014-12-05 /pmc/articles/PMC4261533/ /pubmed/25480146 http://dx.doi.org/10.1186/s12911-014-0111-9 Text en © Wan et al.; licensee BioMed Central Ltd. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Wan, Xiang Liu, Jiming Cheung, William K Tong, Tiejun Learning to improve medical decision making from imbalanced data without a priori cost
title	Learning to improve medical decision making from imbalanced data without a priori cost
title_full	Learning to improve medical decision making from imbalanced data without a priori cost
title_fullStr	Learning to improve medical decision making from imbalanced data without a priori cost
title_full_unstemmed	Learning to improve medical decision making from imbalanced data without a priori cost
title_short	Learning to improve medical decision making from imbalanced data without a priori cost
title_sort	learning to improve medical decision making from imbalanced data without a priori cost
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4261533/ https://www.ncbi.nlm.nih.gov/pubmed/25480146 http://dx.doi.org/10.1186/s12911-014-0111-9
work_keys_str_mv	AT wanxiang learningtoimprovemedicaldecisionmakingfromimbalanceddatawithoutaprioricost AT liujiming learningtoimprovemedicaldecisionmakingfromimbalanceddatawithoutaprioricost AT cheungwilliamk learningtoimprovemedicaldecisionmakingfromimbalanceddatawithoutaprioricost AT tongtiejun learningtoimprovemedicaldecisionmakingfromimbalanceddatawithoutaprioricost

Learning to improve medical decision making from imbalanced data without a priori cost

Ejemplares similares