Cargando…

Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study

IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE:...

Descripción completa

Detalles Bibliográficos
Autores principales: Ng, Madelena Y., Youssef, Alaa, Miner, Adam S., Sarellano, Daniela, Long, Jin, Larson, David B., Hernandez-Boussard, Tina, Langlotz, Curtis P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Association 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10692863/
https://www.ncbi.nlm.nih.gov/pubmed/38039004
http://dx.doi.org/10.1001/jamanetworkopen.2023.45892
_version_ 1785153035518869504
author Ng, Madelena Y.
Youssef, Alaa
Miner, Adam S.
Sarellano, Daniela
Long, Jin
Larson, David B.
Hernandez-Boussard, Tina
Langlotz, Curtis P.
author_facet Ng, Madelena Y.
Youssef, Alaa
Miner, Adam S.
Sarellano, Daniela
Long, Jin
Larson, David B.
Hernandez-Boussard, Tina
Langlotz, Curtis P.
author_sort Ng, Madelena Y.
collection PubMed
description IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE: To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. DESIGN, SETTING, AND PARTICIPANTS: This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. MAIN OUTCOMES AND MEASURES: Data set experts’ perceptions on what makes data sets AI ready. RESULTS: Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. CONCLUSIONS AND RELEVANCE: In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.
format Online
Article
Text
id pubmed-10692863
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Medical Association
record_format MEDLINE/PubMed
spelling pubmed-106928632023-12-03 Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P. JAMA Netw Open Original Investigation IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE: To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. DESIGN, SETTING, AND PARTICIPANTS: This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. MAIN OUTCOMES AND MEASURES: Data set experts’ perceptions on what makes data sets AI ready. RESULTS: Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. CONCLUSIONS AND RELEVANCE: In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices. American Medical Association 2023-12-01 /pmc/articles/PMC10692863/ /pubmed/38039004 http://dx.doi.org/10.1001/jamanetworkopen.2023.45892 Text en Copyright 2023 Ng MY et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License.
spellingShingle Original Investigation
Ng, Madelena Y.
Youssef, Alaa
Miner, Adam S.
Sarellano, Daniela
Long, Jin
Larson, David B.
Hernandez-Boussard, Tina
Langlotz, Curtis P.
Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_full Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_fullStr Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_full_unstemmed Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_short Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
title_sort perceptions of data set experts on important characteristics of health data sets ready for machine learning: a qualitative study
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10692863/
https://www.ncbi.nlm.nih.gov/pubmed/38039004
http://dx.doi.org/10.1001/jamanetworkopen.2023.45892
work_keys_str_mv AT ngmadelenay perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy
AT youssefalaa perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy
AT mineradams perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy
AT sarellanodaniela perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy
AT longjin perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy
AT larsondavidb perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy
AT hernandezboussardtina perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy
AT langlotzcurtisp perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy