Cargando…

Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study

IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE:...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ng, Madelena Y., Youssef, Alaa, Miner, Adam S., Sarellano, Daniela, Long, Jin, Larson, David B., Hernandez-Boussard, Tina, Langlotz, Curtis P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Medical Association 2023
Materias:	Original Investigation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10692863/ https://www.ncbi.nlm.nih.gov/pubmed/38039004 http://dx.doi.org/10.1001/jamanetworkopen.2023.45892

_version_	1785153035518869504
author	Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P.
author_facet	Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P.
author_sort	Ng, Madelena Y.
collection	PubMed
description	IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE: To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. DESIGN, SETTING, AND PARTICIPANTS: This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. MAIN OUTCOMES AND MEASURES: Data set experts’ perceptions on what makes data sets AI ready. RESULTS: Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. CONCLUSIONS AND RELEVANCE: In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.
format	Online Article Text
id	pubmed-10692863
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Medical Association
record_format	MEDLINE/PubMed
spelling	pubmed-106928632023-12-03 Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P. JAMA Netw Open Original Investigation IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE: To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. DESIGN, SETTING, AND PARTICIPANTS: This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. MAIN OUTCOMES AND MEASURES: Data set experts’ perceptions on what makes data sets AI ready. RESULTS: Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. CONCLUSIONS AND RELEVANCE: In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices. American Medical Association 2023-12-01 /pmc/articles/PMC10692863/ /pubmed/38039004 http://dx.doi.org/10.1001/jamanetworkopen.2023.45892 Text en Copyright 2023 Ng MY et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License.
spellingShingle	Original Investigation Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P. Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title	Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_full	Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_fullStr	Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_full_unstemmed	Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_short	Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_sort	perceptions of data set experts on important characteristics of health data sets ready for machine learning: a qualitative study
topic	Original Investigation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10692863/ https://www.ncbi.nlm.nih.gov/pubmed/38039004 http://dx.doi.org/10.1001/jamanetworkopen.2023.45892
work_keys_str_mv	AT ngmadelenay perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT youssefalaa perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT mineradams perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT sarellanodaniela perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT longjin perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT larsondavidb perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT hernandezboussardtina perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT langlotzcurtisp perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy

Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study

Ejemplares similares