Cargando…

Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods

Globally, cervical cancer remains as the foremost prevailing cancer in females. Hence, it is necessary to distinguish the importance of risk factors of cervical cancer to classify potential patients. The present work proposes a cervical cancer prediction model (CCPM) that offers early prediction of...

Descripción completa

Detalles Bibliográficos
Autores principales: Ijaz, Muhammad Fazal, Attique, Muhammad, Son, Youngdoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7284557/
https://www.ncbi.nlm.nih.gov/pubmed/32429090
http://dx.doi.org/10.3390/s20102809
_version_ 1783544495013363712
author Ijaz, Muhammad Fazal
Attique, Muhammad
Son, Youngdoo
author_facet Ijaz, Muhammad Fazal
Attique, Muhammad
Son, Youngdoo
author_sort Ijaz, Muhammad Fazal
collection PubMed
description Globally, cervical cancer remains as the foremost prevailing cancer in females. Hence, it is necessary to distinguish the importance of risk factors of cervical cancer to classify potential patients. The present work proposes a cervical cancer prediction model (CCPM) that offers early prediction of cervical cancer using risk factors as inputs. The CCPM first removes outliers by using outlier detection methods such as density-based spatial clustering of applications with noise (DBSCAN) and isolation forest (iForest) and by increasing the number of cases in the dataset in a balanced way, for example, through synthetic minority over-sampling technique (SMOTE) and SMOTE with Tomek link (SMOTETomek). Finally, it employs random forest (RF) as a classifier. Thus, CCPM lies on four scenarios: (1) DBSCAN + SMOTETomek + RF, (2) DBSCAN + SMOTE+ RF, (3) iForest + SMOTETomek + RF, and (4) iForest + SMOTE + RF. A dataset of 858 potential patients was used to validate the performance of the proposed method. We found that combinations of iForest with SMOTE and iForest with SMOTETomek provided better performances than those of DBSCAN with SMOTE and DBSCAN with SMOTETomek. We also observed that RF performed the best among several popular machine learning classifiers. Furthermore, the proposed CCPM showed better accuracy than previously proposed methods for forecasting cervical cancer. In addition, a mobile application that can collect cervical cancer risk factors data and provides results from CCPM is developed for instant and proper action at the initial stage of cervical cancer.
format Online
Article
Text
id pubmed-7284557
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-72845572020-06-15 Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods Ijaz, Muhammad Fazal Attique, Muhammad Son, Youngdoo Sensors (Basel) Article Globally, cervical cancer remains as the foremost prevailing cancer in females. Hence, it is necessary to distinguish the importance of risk factors of cervical cancer to classify potential patients. The present work proposes a cervical cancer prediction model (CCPM) that offers early prediction of cervical cancer using risk factors as inputs. The CCPM first removes outliers by using outlier detection methods such as density-based spatial clustering of applications with noise (DBSCAN) and isolation forest (iForest) and by increasing the number of cases in the dataset in a balanced way, for example, through synthetic minority over-sampling technique (SMOTE) and SMOTE with Tomek link (SMOTETomek). Finally, it employs random forest (RF) as a classifier. Thus, CCPM lies on four scenarios: (1) DBSCAN + SMOTETomek + RF, (2) DBSCAN + SMOTE+ RF, (3) iForest + SMOTETomek + RF, and (4) iForest + SMOTE + RF. A dataset of 858 potential patients was used to validate the performance of the proposed method. We found that combinations of iForest with SMOTE and iForest with SMOTETomek provided better performances than those of DBSCAN with SMOTE and DBSCAN with SMOTETomek. We also observed that RF performed the best among several popular machine learning classifiers. Furthermore, the proposed CCPM showed better accuracy than previously proposed methods for forecasting cervical cancer. In addition, a mobile application that can collect cervical cancer risk factors data and provides results from CCPM is developed for instant and proper action at the initial stage of cervical cancer. MDPI 2020-05-15 /pmc/articles/PMC7284557/ /pubmed/32429090 http://dx.doi.org/10.3390/s20102809 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ijaz, Muhammad Fazal
Attique, Muhammad
Son, Youngdoo
Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods
title Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods
title_full Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods
title_fullStr Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods
title_full_unstemmed Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods
title_short Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods
title_sort data-driven cervical cancer prediction model with outlier detection and over-sampling methods
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7284557/
https://www.ncbi.nlm.nih.gov/pubmed/32429090
http://dx.doi.org/10.3390/s20102809
work_keys_str_mv AT ijazmuhammadfazal datadrivencervicalcancerpredictionmodelwithoutlierdetectionandoversamplingmethods
AT attiquemuhammad datadrivencervicalcancerpredictionmodelwithoutlierdetectionandoversamplingmethods
AT sonyoungdoo datadrivencervicalcancerpredictionmodelwithoutlierdetectionandoversamplingmethods