Cargando…

Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets

SIMPLE SUMMARY: Large-scale medical data carries significant areas of underrepresentation and bias at all levels: clinical, biological, and management. Resulting data sets and outcome measures reflect these shortcomings in clinical, imaging, and omics data with class imbalance emerging as the single...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tasci, Erdal, Zhuge, Ying, Camphausen, Kevin, Krauze, Andra V.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9221277/ https://www.ncbi.nlm.nih.gov/pubmed/35740563 http://dx.doi.org/10.3390/cancers14122897

_version_	1784732581467521024
author	Tasci, Erdal Zhuge, Ying Camphausen, Kevin Krauze, Andra V.
author_facet	Tasci, Erdal Zhuge, Ying Camphausen, Kevin Krauze, Andra V.
author_sort	Tasci, Erdal
collection	PubMed
description	SIMPLE SUMMARY: Large-scale medical data carries significant areas of underrepresentation and bias at all levels: clinical, biological, and management. Resulting data sets and outcome measures reflect these shortcomings in clinical, imaging, and omics data with class imbalance emerging as the single most significant issue inhibiting meaningful and reproducible conclusions while impacting the transfer of findings between the lab and clinic and limiting improvement in patient outcomes. When employing artificial intelligence methods, class imbalance can produce classifiers whose predicted class probabilities are geared toward the majority class ignoring the significance of minority classes, in turn generating algorithmic bias. The inability to mitigate this can guide an AI system in favor of or against various cohorts or variables. We review sources of bias and class imbalance and relate this to AI methods. We discuss avenues to mitigate these and propose a set of guidelines aimed at limiting and addressing data and algorithmic bias. ABSTRACT: Recent technological developments have led to an increase in the size and types of data in the medical field derived from multiple platforms such as proteomic, genomic, imaging, and clinical data. Many machine learning models have been developed to support precision/personalized medicine initiatives such as computer-aided detection, diagnosis, prognosis, and treatment planning by using large-scale medical data. Bias and class imbalance represent two of the most pressing challenges for machine learning-based problems, particularly in medical (e.g., oncologic) data sets, due to the limitations in patient numbers, cost, privacy, and security of data sharing, and the complexity of generated data. Depending on the data set and the research question, the methods applied to address class imbalance problems can provide more effective, successful, and meaningful results. This review discusses the essential strategies for addressing and mitigating the class imbalance problems for different medical data types in the oncologic domain.
format	Online Article Text
id	pubmed-9221277
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-92212772022-06-24 Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets Tasci, Erdal Zhuge, Ying Camphausen, Kevin Krauze, Andra V. Cancers (Basel) Review SIMPLE SUMMARY: Large-scale medical data carries significant areas of underrepresentation and bias at all levels: clinical, biological, and management. Resulting data sets and outcome measures reflect these shortcomings in clinical, imaging, and omics data with class imbalance emerging as the single most significant issue inhibiting meaningful and reproducible conclusions while impacting the transfer of findings between the lab and clinic and limiting improvement in patient outcomes. When employing artificial intelligence methods, class imbalance can produce classifiers whose predicted class probabilities are geared toward the majority class ignoring the significance of minority classes, in turn generating algorithmic bias. The inability to mitigate this can guide an AI system in favor of or against various cohorts or variables. We review sources of bias and class imbalance and relate this to AI methods. We discuss avenues to mitigate these and propose a set of guidelines aimed at limiting and addressing data and algorithmic bias. ABSTRACT: Recent technological developments have led to an increase in the size and types of data in the medical field derived from multiple platforms such as proteomic, genomic, imaging, and clinical data. Many machine learning models have been developed to support precision/personalized medicine initiatives such as computer-aided detection, diagnosis, prognosis, and treatment planning by using large-scale medical data. Bias and class imbalance represent two of the most pressing challenges for machine learning-based problems, particularly in medical (e.g., oncologic) data sets, due to the limitations in patient numbers, cost, privacy, and security of data sharing, and the complexity of generated data. Depending on the data set and the research question, the methods applied to address class imbalance problems can provide more effective, successful, and meaningful results. This review discusses the essential strategies for addressing and mitigating the class imbalance problems for different medical data types in the oncologic domain. MDPI 2022-06-12 /pmc/articles/PMC9221277/ /pubmed/35740563 http://dx.doi.org/10.3390/cancers14122897 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Review Tasci, Erdal Zhuge, Ying Camphausen, Kevin Krauze, Andra V. Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title	Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_full	Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_fullStr	Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_full_unstemmed	Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_short	Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets
title_sort	bias and class imbalance in oncologic data—towards inclusive and transferrable ai in large scale oncology data sets
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9221277/ https://www.ncbi.nlm.nih.gov/pubmed/35740563 http://dx.doi.org/10.3390/cancers14122897
work_keys_str_mv	AT tascierdal biasandclassimbalanceinoncologicdatatowardsinclusiveandtransferrableaiinlargescaleoncologydatasets AT zhugeying biasandclassimbalanceinoncologicdatatowardsinclusiveandtransferrableaiinlargescaleoncologydatasets AT camphausenkevin biasandclassimbalanceinoncologicdatatowardsinclusiveandtransferrableaiinlargescaleoncologydatasets AT krauzeandrav biasandclassimbalanceinoncologicdatatowardsinclusiveandtransferrableaiinlargescaleoncologydatasets

Bias and Class Imbalance in Oncologic Data—Towards Inclusive and Transferrable AI in Large Scale Oncology Data Sets

Ejemplares similares