Cargando…

Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings

BACKGROUND: Despite the vagueness and uncertainty that is intrinsic in any medical act, interpretation and decision (including acts of data reporting and representation of relevant medical conditions), still little research has focused on how to explicitly take this uncertainty into account. In this...

Descripción completa

Detalles Bibliográficos
Autores principales: Seveso, Andrea, Campagner, Andrea, Ciucci, Davide, Cabitza, Federico
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7439656/
https://www.ncbi.nlm.nih.gov/pubmed/32819345
http://dx.doi.org/10.1186/s12911-020-01152-8
_version_ 1783573023011373056
author Seveso, Andrea
Campagner, Andrea
Ciucci, Davide
Cabitza, Federico
author_facet Seveso, Andrea
Campagner, Andrea
Ciucci, Davide
Cabitza, Federico
author_sort Seveso, Andrea
collection PubMed
description BACKGROUND: Despite the vagueness and uncertainty that is intrinsic in any medical act, interpretation and decision (including acts of data reporting and representation of relevant medical conditions), still little research has focused on how to explicitly take this uncertainty into account. In this paper, we focus on the representation of a general and wide-spread medical terminology, which is grounded on a traditional and well-established convention, to represent severity of health conditions (for instance, pain, visible signs), ranging from Absent to Extreme. Specifically, we will study how both potential patients and doctors perceive the different levels of the terminology in both quantitative and qualitative terms, and if the embedded user knowledge could improve the representation of ordinal values in the construction of machine learning models. METHODS: To this aim, we conducted a questionnaire-based research study involving a relatively large sample of 1,152 potential patients and 31 clinicians to represent numerically the perceived meaning of standard and widely-applied labels to describe health conditions. Using these collected values, we then present and discuss different possible fuzzy-set based representations that address the vagueness of medical interpretation by taking into account the perceptions of domain experts. We also apply the findings of this user study to evaluate the impact of different encodings on the predictive performance of common machine learning models in regard to a real-world medical prognostic task. RESULTS: We found significant differences in the perception of pain levels between the two user groups. We also show that the proposed encodings can improve the performances of specific classes of models, and discuss when this is the case. CONCLUSIONS: In perspective, our hope is that the proposed techniques for ordinal scale representation and ordinal encoding may be useful to the research community, and also that our methodology will be applied to other widely used ordinal scales for improving validity of datasets and bettering the results of machine learning tasks.
format Online
Article
Text
id pubmed-7439656
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74396562020-08-24 Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings Seveso, Andrea Campagner, Andrea Ciucci, Davide Cabitza, Federico BMC Med Inform Decis Mak Research BACKGROUND: Despite the vagueness and uncertainty that is intrinsic in any medical act, interpretation and decision (including acts of data reporting and representation of relevant medical conditions), still little research has focused on how to explicitly take this uncertainty into account. In this paper, we focus on the representation of a general and wide-spread medical terminology, which is grounded on a traditional and well-established convention, to represent severity of health conditions (for instance, pain, visible signs), ranging from Absent to Extreme. Specifically, we will study how both potential patients and doctors perceive the different levels of the terminology in both quantitative and qualitative terms, and if the embedded user knowledge could improve the representation of ordinal values in the construction of machine learning models. METHODS: To this aim, we conducted a questionnaire-based research study involving a relatively large sample of 1,152 potential patients and 31 clinicians to represent numerically the perceived meaning of standard and widely-applied labels to describe health conditions. Using these collected values, we then present and discuss different possible fuzzy-set based representations that address the vagueness of medical interpretation by taking into account the perceptions of domain experts. We also apply the findings of this user study to evaluate the impact of different encodings on the predictive performance of common machine learning models in regard to a real-world medical prognostic task. RESULTS: We found significant differences in the perception of pain levels between the two user groups. We also show that the proposed encodings can improve the performances of specific classes of models, and discuss when this is the case. CONCLUSIONS: In perspective, our hope is that the proposed techniques for ordinal scale representation and ordinal encoding may be useful to the research community, and also that our methodology will be applied to other widely used ordinal scales for improving validity of datasets and bettering the results of machine learning tasks. BioMed Central 2020-08-20 /pmc/articles/PMC7439656/ /pubmed/32819345 http://dx.doi.org/10.1186/s12911-020-01152-8 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Seveso, Andrea
Campagner, Andrea
Ciucci, Davide
Cabitza, Federico
Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings
title Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings
title_full Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings
title_fullStr Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings
title_full_unstemmed Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings
title_short Ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings
title_sort ordinal labels in machine learning: a user-centered approach to improve data validity in medical settings
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7439656/
https://www.ncbi.nlm.nih.gov/pubmed/32819345
http://dx.doi.org/10.1186/s12911-020-01152-8
work_keys_str_mv AT sevesoandrea ordinallabelsinmachinelearningausercenteredapproachtoimprovedatavalidityinmedicalsettings
AT campagnerandrea ordinallabelsinmachinelearningausercenteredapproachtoimprovedatavalidityinmedicalsettings
AT ciuccidavide ordinallabelsinmachinelearningausercenteredapproachtoimprovedatavalidityinmedicalsettings
AT cabitzafederico ordinallabelsinmachinelearningausercenteredapproachtoimprovedatavalidityinmedicalsettings