Cargando…
Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study
IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE:...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Medical Association
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10692863/ https://www.ncbi.nlm.nih.gov/pubmed/38039004 http://dx.doi.org/10.1001/jamanetworkopen.2023.45892 |
_version_ | 1785153035518869504 |
---|---|
author | Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P. |
author_facet | Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P. |
author_sort | Ng, Madelena Y. |
collection | PubMed |
description | IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE: To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. DESIGN, SETTING, AND PARTICIPANTS: This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. MAIN OUTCOMES AND MEASURES: Data set experts’ perceptions on what makes data sets AI ready. RESULTS: Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. CONCLUSIONS AND RELEVANCE: In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices. |
format | Online Article Text |
id | pubmed-10692863 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Medical Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-106928632023-12-03 Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P. JAMA Netw Open Original Investigation IMPORTANCE: The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. OBJECTIVE: To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. DESIGN, SETTING, AND PARTICIPANTS: This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. MAIN OUTCOMES AND MEASURES: Data set experts’ perceptions on what makes data sets AI ready. RESULTS: Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. CONCLUSIONS AND RELEVANCE: In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices. American Medical Association 2023-12-01 /pmc/articles/PMC10692863/ /pubmed/38039004 http://dx.doi.org/10.1001/jamanetworkopen.2023.45892 Text en Copyright 2023 Ng MY et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License. |
spellingShingle | Original Investigation Ng, Madelena Y. Youssef, Alaa Miner, Adam S. Sarellano, Daniela Long, Jin Larson, David B. Hernandez-Boussard, Tina Langlotz, Curtis P. Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study |
title | Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study |
title_full | Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study |
title_fullStr | Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study |
title_full_unstemmed | Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study |
title_short | Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning: A Qualitative Study |
title_sort | perceptions of data set experts on important characteristics of health data sets ready for machine learning: a qualitative study |
topic | Original Investigation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10692863/ https://www.ncbi.nlm.nih.gov/pubmed/38039004 http://dx.doi.org/10.1001/jamanetworkopen.2023.45892 |
work_keys_str_mv | AT ngmadelenay perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT youssefalaa perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT mineradams perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT sarellanodaniela perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT longjin perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT larsondavidb perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT hernandezboussardtina perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy AT langlotzcurtisp perceptionsofdatasetexpertsonimportantcharacteristicsofhealthdatasetsreadyformachinelearningaqualitativestudy |