Cargando…

Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets

SIMPLE SUMMARY: Large-scale medical data carries significant areas of underrepresentation and bias at all levels: clinical, biological, and management. Resulting data sets and outcome measures reflect these shortcomings in clinical, imaging, and omics data with class imbalance emerging as the single...

Descripción completa

Detalles Bibliográficos
Autores principales: Tasci, Erdal, Zhuge, Ying, Camphausen, Kevin, Krauze, Andra V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9221277/
https://www.ncbi.nlm.nih.gov/pubmed/35740563
http://dx.doi.org/10.3390/cancers14122897
_version_ 1784732581467521024
author Tasci, Erdal
Zhuge, Ying
Camphausen, Kevin
Krauze, Andra V.
author_facet Tasci, Erdal
Zhuge, Ying
Camphausen, Kevin
Krauze, Andra V.
author_sort Tasci, Erdal
collection PubMed
description SIMPLE SUMMARY: Large-scale medical data carries significant areas of underrepresentation and bias at all levels: clinical, biological, and management. Resulting data sets and outcome measures reflect these shortcomings in clinical, imaging, and omics data with class imbalance emerging as the single most significant issue inhibiting meaningful and reproducible conclusions while impacting the transfer of findings between the lab and clinic and limiting improvement in patient outcomes. When employing artificial intelligence methods, class imbalance can produce classifiers whose predicted class probabilities are geared toward the majority class ignoring the significance of minority classes, in turn generating algorithmic bias. The inability to mitigate this can guide an AI system in favor of or against various cohorts or variables. We review sources of bias and class imbalance and relate this to AI methods. We discuss avenues to mitigate these and propose a set of guidelines aimed at limiting and addressing data and algorithmic bias. ABSTRACT: Recent technological developments have led to an increase in the size and types of data in the medical field derived from multiple platforms such as proteomic, genomic, imaging, and clinical data. Many machine learning models have been developed to support precision/personalized medicine initiatives such as computer-aided detection, diagnosis, prognosis, and treatment planning by using large-scale medical data. Bias and class imbalance represent two of the most pressing challenges for machine learning-based problems, particularly in medical (e.g., oncologic) data sets, due to the limitations in patient numbers, cost, privacy, and security of data sharing, and the complexity of generated data. Depending on the data set and the research question, the methods applied to address class imbalance problems can provide more effective, successful, and meaningful results. This review discusses the essential strategies for addressing and mitigating the class imbalance problems for different medical data types in the oncologic domain.
format Online
Article
Text
id pubmed-9221277
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-92212772022-06-24 Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets Tasci, Erdal Zhuge, Ying Camphausen, Kevin Krauze, Andra V. Cancers (Basel) Review SIMPLE SUMMARY: Large-scale medical data carries significant areas of underrepresentation and bias at all levels: clinical, biological, and management. Resulting data sets and outcome measures reflect these shortcomings in clinical, imaging, and omics data with class imbalance emerging as the single most significant issue inhibiting meaningful and reproducible conclusions while impacting the transfer of findings between the lab and clinic and limiting improvement in patient outcomes. When employing artificial intelligence methods, class imbalance can produce classifiers whose predicted class probabilities are geared toward the majority class ignoring the significance of minority classes, in turn generating algorithmic bias. The inability to mitigate this can guide an AI system in favor of or against various cohorts or variables. We review sources of bias and class imbalance and relate this to AI methods. We discuss avenues to mitigate these and propose a set of guidelines aimed at limiting and addressing data and algorithmic bias. ABSTRACT: Recent technological developments have led to an increase in the size and types of data in the medical field derived from multiple platforms such as proteomic, genomic, imaging, and clinical data. Many machine learning models have been developed to support precision/personalized medicine initiatives such as computer-aided detection, diagnosis, prognosis, and treatment planning by using large-scale medical data. Bias and class imbalance represent two of the most pressing challenges for machine learning-based problems, particularly in medical (e.g., oncologic) data sets, due to the limitations in patient numbers, cost, privacy, and security of data sharing, and the complexity of generated data. Depending on the data set and the research question, the methods applied to address class imbalance problems can provide more effective, successful, and meaningful results. This review discusses the essential strategies for addressing and mitigating the class imbalance problems for different medical data types in the oncologic domain. MDPI 2022-06-12 /pmc/articles/PMC9221277/ /pubmed/35740563 http://dx.doi.org/10.3390/cancers14122897 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Review
Tasci, Erdal
Zhuge, Ying
Camphausen, Kevin
Krauze, Andra V.
Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_full Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_fullStr Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_full_unstemmed Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_short Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_sort bias and class imbalance in oncologic data—towards inclusive and transferrable ai in large scale oncology data sets
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9221277/
https://www.ncbi.nlm.nih.gov/pubmed/35740563
http://dx.doi.org/10.3390/cancers14122897
work_keys_str_mv AT tascierdal biasandclassimbalanceinoncologicdatatowardsinclusiveandtransferrableaiinlargescaleoncologydatasets
AT zhugeying biasandclassimbalanceinoncologicdatatowardsinclusiveandtransferrableaiinlargescaleoncologydatasets
AT camphausenkevin biasandclassimbalanceinoncologicdatatowardsinclusiveandtransferrableaiinlargescaleoncologydatasets
AT krauzeandrav biasandclassimbalanceinoncologicdatatowardsinclusiveandtransferrableaiinlargescaleoncologydatasets