Cargando…

Classification of breast cancer recurrence based on imputed data: a simulation study

Several studies have been conducted to classify various real life events but few are in medical fields; particularly about breast recurrence under statistical techniques. To our knowledge, there is no reported comparison of statistical classification accuracy and classifiers’ discriminative ability...

Descripción completa

Detalles Bibliográficos
Autores principales: Abassi, Rahibu A., Msengwa, Amina S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9727846/
https://www.ncbi.nlm.nih.gov/pubmed/36476234
http://dx.doi.org/10.1186/s13040-022-00316-8
_version_ 1784845113255526400
author Abassi, Rahibu A.
Msengwa, Amina S.
author_facet Abassi, Rahibu A.
Msengwa, Amina S.
author_sort Abassi, Rahibu A.
collection PubMed
description Several studies have been conducted to classify various real life events but few are in medical fields; particularly about breast recurrence under statistical techniques. To our knowledge, there is no reported comparison of statistical classification accuracy and classifiers’ discriminative ability on breast cancer recurrence in presence of imputed missing data. Therefore, this article aims to fill this analysis gap by comparing the performance of binary classifiers (logistic regression, linear and quadratic discriminant analysis) using several datasets resulted from imputation process using various simulation conditions. Our study aids the knowledge about how classifiers’ accuracy and discriminative ability in classifying a binary outcome variable are affected by the presence of imputed numerical missing data. We simulated incomplete datasets with 15, 30, 45 and 60% of missingness under Missing At Random (MAR) and Missing Completely At Random (MCAR) mechanisms. Mean imputation, hot deck, k-nearest neighbour, multiple imputations via chained equation, expected-maximisation, and predictive mean matching were used to impute incomplete datasets. For each classifier, correct classification accuracy and area under the Receiver Operating Characteristic (ROC) curves under MAR and MCAR mechanisms were compared. The linear discriminant classifier attained the highest classification accuracy (73.9%) based on mean-imputed data at 45% of missing data under MCAR mechanism. As a classifier, the logistic regression based on predictive mean matching imputed-data yields the greatest areas under ROC curves (0.6418) at 30% missingness while k-nearest neighbour tops the value (0.6428) at 60% of missing data under MCAR mechanism.
format Online
Article
Text
id pubmed-9727846
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-97278462022-12-08 Classification of breast cancer recurrence based on imputed data: a simulation study Abassi, Rahibu A. Msengwa, Amina S. BioData Min Research Several studies have been conducted to classify various real life events but few are in medical fields; particularly about breast recurrence under statistical techniques. To our knowledge, there is no reported comparison of statistical classification accuracy and classifiers’ discriminative ability on breast cancer recurrence in presence of imputed missing data. Therefore, this article aims to fill this analysis gap by comparing the performance of binary classifiers (logistic regression, linear and quadratic discriminant analysis) using several datasets resulted from imputation process using various simulation conditions. Our study aids the knowledge about how classifiers’ accuracy and discriminative ability in classifying a binary outcome variable are affected by the presence of imputed numerical missing data. We simulated incomplete datasets with 15, 30, 45 and 60% of missingness under Missing At Random (MAR) and Missing Completely At Random (MCAR) mechanisms. Mean imputation, hot deck, k-nearest neighbour, multiple imputations via chained equation, expected-maximisation, and predictive mean matching were used to impute incomplete datasets. For each classifier, correct classification accuracy and area under the Receiver Operating Characteristic (ROC) curves under MAR and MCAR mechanisms were compared. The linear discriminant classifier attained the highest classification accuracy (73.9%) based on mean-imputed data at 45% of missing data under MCAR mechanism. As a classifier, the logistic regression based on predictive mean matching imputed-data yields the greatest areas under ROC curves (0.6418) at 30% missingness while k-nearest neighbour tops the value (0.6428) at 60% of missing data under MCAR mechanism. BioMed Central 2022-12-07 /pmc/articles/PMC9727846/ /pubmed/36476234 http://dx.doi.org/10.1186/s13040-022-00316-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Abassi, Rahibu A.
Msengwa, Amina S.
Classification of breast cancer recurrence based on imputed data: a simulation study
title Classification of breast cancer recurrence based on imputed data: a simulation study
title_full Classification of breast cancer recurrence based on imputed data: a simulation study
title_fullStr Classification of breast cancer recurrence based on imputed data: a simulation study
title_full_unstemmed Classification of breast cancer recurrence based on imputed data: a simulation study
title_short Classification of breast cancer recurrence based on imputed data: a simulation study
title_sort classification of breast cancer recurrence based on imputed data: a simulation study
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9727846/
https://www.ncbi.nlm.nih.gov/pubmed/36476234
http://dx.doi.org/10.1186/s13040-022-00316-8
work_keys_str_mv AT abassirahibua classificationofbreastcancerrecurrencebasedonimputeddataasimulationstudy
AT msengwaaminas classificationofbreastcancerrecurrencebasedonimputeddataasimulationstudy