Cargando…

Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms

BACKGROUND: The risk factors of diabetic retinopathy (DR) were investigated extensively in the past studies, but it remains unknown which risk factors were more associated with the DR than others. If we can detect the DR related risk factors more accurately, we can then exercise early prevention str...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsao, Hsin-Yi, Chan, Pei-Ying, Su, Emily Chia-Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101083/
https://www.ncbi.nlm.nih.gov/pubmed/30367589
http://dx.doi.org/10.1186/s12859-018-2277-0
_version_ 1783348984699420672
author Tsao, Hsin-Yi
Chan, Pei-Ying
Su, Emily Chia-Yu
author_facet Tsao, Hsin-Yi
Chan, Pei-Ying
Su, Emily Chia-Yu
author_sort Tsao, Hsin-Yi
collection PubMed
description BACKGROUND: The risk factors of diabetic retinopathy (DR) were investigated extensively in the past studies, but it remains unknown which risk factors were more associated with the DR than others. If we can detect the DR related risk factors more accurately, we can then exercise early prevention strategies for diabetic retinopathy in the most high-risk population. The purpose of this study is to build a prediction model for the DR in type 2 diabetes mellitus using data mining techniques including the support vector machines, decision trees, artificial neural networks, and logistic regressions. RESULTS: Experimental results demonstrated that prediction performance by support vector machines performed better than the other machine learning algorithms and achieved 79.5% and 0.839 in accuracy and area under the receiver operating characteristic curve using percentage split (i.e., data set divided into 80% as trainning and 20% as test), respectively. Evaluated by three-way data split scheme (i.e., data set divided into 60% as training, 20% as validation, and 20% as independent test), our method obtained slightly lower performance compared to percentage split, which suggested that three-way data split is a better way to evaluate the real performance and prevent overestimation. Moreover, we incorporated approaches proposed in previous studies to evaluate our data set and our prediction performance outperformed the other previous studies in most evaluation measures. This lends support to our assumption that appropriate machine learning algorithms combined with discriminative clinical features can effectively detect diabetic retinopathy. CONCLUSIONS: Our method identifies use of insulin and duration of diabetes as novel interpretable features to assist with clinical decisions in identifying the high-risk populations for diabetic retinopathy. If duration of DM increases by 1 year, the odds ratio to have DMR is increased by 9.3%. The odds ratio to have DR is increased by 3.561 times for patients who use insulin compared to patients who do not use insulin. Our results can be used to facilitate development of clinical decision support systems for clinical practice in the future.
format Online
Article
Text
id pubmed-6101083
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61010832018-08-27 Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms Tsao, Hsin-Yi Chan, Pei-Ying Su, Emily Chia-Yu BMC Bioinformatics Research BACKGROUND: The risk factors of diabetic retinopathy (DR) were investigated extensively in the past studies, but it remains unknown which risk factors were more associated with the DR than others. If we can detect the DR related risk factors more accurately, we can then exercise early prevention strategies for diabetic retinopathy in the most high-risk population. The purpose of this study is to build a prediction model for the DR in type 2 diabetes mellitus using data mining techniques including the support vector machines, decision trees, artificial neural networks, and logistic regressions. RESULTS: Experimental results demonstrated that prediction performance by support vector machines performed better than the other machine learning algorithms and achieved 79.5% and 0.839 in accuracy and area under the receiver operating characteristic curve using percentage split (i.e., data set divided into 80% as trainning and 20% as test), respectively. Evaluated by three-way data split scheme (i.e., data set divided into 60% as training, 20% as validation, and 20% as independent test), our method obtained slightly lower performance compared to percentage split, which suggested that three-way data split is a better way to evaluate the real performance and prevent overestimation. Moreover, we incorporated approaches proposed in previous studies to evaluate our data set and our prediction performance outperformed the other previous studies in most evaluation measures. This lends support to our assumption that appropriate machine learning algorithms combined with discriminative clinical features can effectively detect diabetic retinopathy. CONCLUSIONS: Our method identifies use of insulin and duration of diabetes as novel interpretable features to assist with clinical decisions in identifying the high-risk populations for diabetic retinopathy. If duration of DM increases by 1 year, the odds ratio to have DMR is increased by 9.3%. The odds ratio to have DR is increased by 3.561 times for patients who use insulin compared to patients who do not use insulin. Our results can be used to facilitate development of clinical decision support systems for clinical practice in the future. BioMed Central 2018-08-13 /pmc/articles/PMC6101083/ /pubmed/30367589 http://dx.doi.org/10.1186/s12859-018-2277-0 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Tsao, Hsin-Yi
Chan, Pei-Ying
Su, Emily Chia-Yu
Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms
title Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms
title_full Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms
title_fullStr Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms
title_full_unstemmed Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms
title_short Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms
title_sort predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101083/
https://www.ncbi.nlm.nih.gov/pubmed/30367589
http://dx.doi.org/10.1186/s12859-018-2277-0
work_keys_str_mv AT tsaohsinyi predictingdiabeticretinopathyandidentifyinginterpretablebiomedicalfeaturesusingmachinelearningalgorithms
AT chanpeiying predictingdiabeticretinopathyandidentifyinginterpretablebiomedicalfeaturesusingmachinelearningalgorithms
AT suemilychiayu predictingdiabeticretinopathyandidentifyinginterpretablebiomedicalfeaturesusingmachinelearningalgorithms