Cargando…

Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches

BACKGROUND: About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowl...

Descripción completa

Detalles Bibliográficos
Autores principales: Kazerouni, Faranak, Bayani, Azadeh, Asadi, Farkhondeh, Saeidi, Leyla, Parvizi, Nasrin, Mansoori, Zahra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7451240/
https://www.ncbi.nlm.nih.gov/pubmed/32854616
http://dx.doi.org/10.1186/s12859-020-03719-8
_version_ 1783574944407355392
author Kazerouni, Faranak
Bayani, Azadeh
Asadi, Farkhondeh
Saeidi, Leyla
Parvizi, Nasrin
Mansoori, Zahra
author_facet Kazerouni, Faranak
Bayani, Azadeh
Asadi, Farkhondeh
Saeidi, Leyla
Parvizi, Nasrin
Mansoori, Zahra
author_sort Kazerouni, Faranak
collection PubMed
description BACKGROUND: About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowledge from the data. These techniques may enhance the prognosis and diagnosis associated with reducing diseases such as T2DM. We applied four classification models, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression, and artificial neural networks (ANN) for diagnosing T2DM, and we compared the diagnostic power of these algorithms with each other. We performed the algorithms on six LncRNA variables (LINC00523, LINC00995, HCG27_201, TPT1-AS1, LY86-AS1, DKFZP) and demographic data. RESULTS: To select the best performance, we considered the AUC, sensitivity, specificity, plotted the ROC curve, and showed the average curve and range. The mean AUC for the KNN algorithm was 91% with 0.09 standard deviation (SD); the mean sensitivity and specificity were 96 and 85%, respectively. After applying the SVM algorithm, the mean AUC obtained 95% after stratified 10-fold cross-validation, and the SD obtained 0.05. The mean sensitivity and specificity were 95 and 86%, respectively. The mean AUC for ANN and the SD were 93% and 0.03, also the mean sensitivity and specificity were 78 and 85%. At last, for the logistic regression algorithm, our results showed 95% of mean AUC, and the SD of 0.05, the mean sensitivity and specificity were 92 and 85%, respectively. According to the ROCs, the Logistic Regression and SVM had a better area under the curve compared to the others. CONCLUSION: We aimed to find the best data mining approach for the prediction of T2DM using six lncRNA expression. According to the finding, the maximum AUC dedicated to SVM and logistic regression, among others, KNN and ANN also had the high mean AUC and small standard deviations of AUC scores among the approaches, KNN had the highest mean sensitivity and the highest specificity belonged to SVM. This study’s result could improve our knowledge about the early detection and diagnosis of T2DM using the lncRNAs as biomarkers.
format Online
Article
Text
id pubmed-7451240
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74512402020-08-28 Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches Kazerouni, Faranak Bayani, Azadeh Asadi, Farkhondeh Saeidi, Leyla Parvizi, Nasrin Mansoori, Zahra BMC Bioinformatics Research Article BACKGROUND: About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowledge from the data. These techniques may enhance the prognosis and diagnosis associated with reducing diseases such as T2DM. We applied four classification models, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression, and artificial neural networks (ANN) for diagnosing T2DM, and we compared the diagnostic power of these algorithms with each other. We performed the algorithms on six LncRNA variables (LINC00523, LINC00995, HCG27_201, TPT1-AS1, LY86-AS1, DKFZP) and demographic data. RESULTS: To select the best performance, we considered the AUC, sensitivity, specificity, plotted the ROC curve, and showed the average curve and range. The mean AUC for the KNN algorithm was 91% with 0.09 standard deviation (SD); the mean sensitivity and specificity were 96 and 85%, respectively. After applying the SVM algorithm, the mean AUC obtained 95% after stratified 10-fold cross-validation, and the SD obtained 0.05. The mean sensitivity and specificity were 95 and 86%, respectively. The mean AUC for ANN and the SD were 93% and 0.03, also the mean sensitivity and specificity were 78 and 85%. At last, for the logistic regression algorithm, our results showed 95% of mean AUC, and the SD of 0.05, the mean sensitivity and specificity were 92 and 85%, respectively. According to the ROCs, the Logistic Regression and SVM had a better area under the curve compared to the others. CONCLUSION: We aimed to find the best data mining approach for the prediction of T2DM using six lncRNA expression. According to the finding, the maximum AUC dedicated to SVM and logistic regression, among others, KNN and ANN also had the high mean AUC and small standard deviations of AUC scores among the approaches, KNN had the highest mean sensitivity and the highest specificity belonged to SVM. This study’s result could improve our knowledge about the early detection and diagnosis of T2DM using the lncRNAs as biomarkers. BioMed Central 2020-08-27 /pmc/articles/PMC7451240/ /pubmed/32854616 http://dx.doi.org/10.1186/s12859-020-03719-8 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Kazerouni, Faranak
Bayani, Azadeh
Asadi, Farkhondeh
Saeidi, Leyla
Parvizi, Nasrin
Mansoori, Zahra
Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_full Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_fullStr Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_full_unstemmed Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_short Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches
title_sort type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding rnas expression: a comparison of four data mining approaches
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7451240/
https://www.ncbi.nlm.nih.gov/pubmed/32854616
http://dx.doi.org/10.1186/s12859-020-03719-8
work_keys_str_mv AT kazerounifaranak type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT bayaniazadeh type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT asadifarkhondeh type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT saeidileyla type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT parvizinasrin type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches
AT mansoorizahra type2diabetesmellituspredictionusingdataminingalgorithmsbasedonthelongnoncodingrnasexpressionacomparisonoffourdataminingapproaches