Cargando…

Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services

In recent years, due to an increase in the incidence of different cancers, various data sources are available in this field. Consequently, many researchers have become interested in the discovery of useful knowledge from available data to assist faster decision-making by doctors and reduce the negat...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sajjadnia, Zeinab, Khayami, Raof, Moosavi, Mohammad Reza
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2020
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7262833/ https://www.ncbi.nlm.nih.gov/pubmed/32528221 http://dx.doi.org/10.1177/1176935120917955

_version_	1783540696133664768
author	Sajjadnia, Zeinab Khayami, Raof Moosavi, Mohammad Reza
author_facet	Sajjadnia, Zeinab Khayami, Raof Moosavi, Mohammad Reza
author_sort	Sajjadnia, Zeinab
collection	PubMed
description	In recent years, due to an increase in the incidence of different cancers, various data sources are available in this field. Consequently, many researchers have become interested in the discovery of useful knowledge from available data to assist faster decision-making by doctors and reduce the negative consequences of such diseases. Data mining includes a set of useful techniques in the discovery of knowledge from the data: detecting hidden patterns and finding unknown relations. However, these techniques face several challenges with real-world data. Particularly, dealing with inconsistencies, errors, noise, and missing values requires appropriate preprocessing and data preparation procedures. In this article, we investigate the impact of preprocessing to provide high-quality data for classification techniques. A wide range of preprocessing and data preparation methods are studied, and a set of preprocessing steps was leveraged to obtain appropriate classification results. The preprocessing is done on a real-world breast cancer dataset of the Reza Radiation Oncology Center in Mashhad with various features and a great percentage of null values, and the results are reported in this article. To evaluate the impact of the preprocessing steps on the results of classification algorithms, this case study was divided into the following 3 experiments: Breast cancer recurrence prediction without data preprocessing Breast cancer recurrence prediction by error removal Breast cancer recurrence prediction by error removal and filling null values Then, in each experiment, dimensionality reduction techniques are used to select a suitable subset of features for the problem at hand. Breast cancer recurrence prediction models are constructed using the 3 widely used classification algorithms, namely, naïve Bayes, k-nearest neighbor, and sequential minimal optimization. The evaluation of the experiments is done in terms of accuracy, sensitivity, F-measure, precision, and G-mean measures. Our results show that recurrence prediction is significantly improved after data preprocessing, especially in terms of sensitivity, F-measure, precision, and G-mean measures.
format	Online Article Text
id	pubmed-7262833
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-72628332020-06-10 Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services Sajjadnia, Zeinab Khayami, Raof Moosavi, Mohammad Reza Cancer Inform Original Research In recent years, due to an increase in the incidence of different cancers, various data sources are available in this field. Consequently, many researchers have become interested in the discovery of useful knowledge from available data to assist faster decision-making by doctors and reduce the negative consequences of such diseases. Data mining includes a set of useful techniques in the discovery of knowledge from the data: detecting hidden patterns and finding unknown relations. However, these techniques face several challenges with real-world data. Particularly, dealing with inconsistencies, errors, noise, and missing values requires appropriate preprocessing and data preparation procedures. In this article, we investigate the impact of preprocessing to provide high-quality data for classification techniques. A wide range of preprocessing and data preparation methods are studied, and a set of preprocessing steps was leveraged to obtain appropriate classification results. The preprocessing is done on a real-world breast cancer dataset of the Reza Radiation Oncology Center in Mashhad with various features and a great percentage of null values, and the results are reported in this article. To evaluate the impact of the preprocessing steps on the results of classification algorithms, this case study was divided into the following 3 experiments: Breast cancer recurrence prediction without data preprocessing Breast cancer recurrence prediction by error removal Breast cancer recurrence prediction by error removal and filling null values Then, in each experiment, dimensionality reduction techniques are used to select a suitable subset of features for the problem at hand. Breast cancer recurrence prediction models are constructed using the 3 widely used classification algorithms, namely, naïve Bayes, k-nearest neighbor, and sequential minimal optimization. The evaluation of the experiments is done in terms of accuracy, sensitivity, F-measure, precision, and G-mean measures. Our results show that recurrence prediction is significantly improved after data preprocessing, especially in terms of sensitivity, F-measure, precision, and G-mean measures. SAGE Publications 2020-05-27 /pmc/articles/PMC7262833/ /pubmed/32528221 http://dx.doi.org/10.1177/1176935120917955 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Original Research Sajjadnia, Zeinab Khayami, Raof Moosavi, Mohammad Reza Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services
title	Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services
title_full	Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services
title_fullStr	Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services
title_full_unstemmed	Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services
title_short	Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services
title_sort	preprocessing breast cancer data to improve the data quality, diagnosis procedure, and medical care services
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7262833/ https://www.ncbi.nlm.nih.gov/pubmed/32528221 http://dx.doi.org/10.1177/1176935120917955
work_keys_str_mv	AT sajjadniazeinab preprocessingbreastcancerdatatoimprovethedataqualitydiagnosisprocedureandmedicalcareservices AT khayamiraof preprocessingbreastcancerdatatoimprovethedataqualitydiagnosisprocedureandmedicalcareservices AT moosavimohammadreza preprocessingbreastcancerdatatoimprovethedataqualitydiagnosisprocedureandmedicalcareservices

Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services

Ejemplares similares