Cargando…

Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines

BACKGROUND: As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive...

Descripción completa

Detalles Bibliográficos
Autores principales:	Saharan, Seema Singh, Nagar, Pankaj, Creasy, Kate Townsend, Stock, Eveline O., Feng, James, Malloy, Mary J., Kane, John P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8050889/ https://www.ncbi.nlm.nih.gov/pubmed/33858484 http://dx.doi.org/10.1186/s13040-021-00260-z

_version_	1783679656374829056
author	Saharan, Seema Singh Nagar, Pankaj Creasy, Kate Townsend Stock, Eveline O. Feng, James Malloy, Mary J. Kane, John P.
author_facet	Saharan, Seema Singh Nagar, Pankaj Creasy, Kate Townsend Stock, Eveline O. Feng, James Malloy, Mary J. Kane, John P.
author_sort	Saharan, Seema Singh
collection	PubMed
description	BACKGROUND: As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The innovative and emerging Machine Learning (ML) techniques can be leveraged to facilitate early detection of CAD which is a crucial factor in saving lives. The standard techniques like angiography, that provide reliable evidence are invasive and typically expensive and risky. In contrast, ML model generated diagnosis is non-invasive, fast, accurate and affordable. Therefore, ML algorithms can be used as a supplement or precursor to the conventional methods. This research demonstrates the implementation and comparative analysis of K Nearest Neighbor (k-NN) and Random Forest ML algorithms to achieve a targeted “At Risk” CAD classification using an emerging set of 35 cytokine biomarkers that are strongly indicative predictive variables that can be potential targets for therapy. To ensure better generalizability, mechanisms such as data balancing, repeated k-fold cross validation for hyperparameter tuning, were integrated within the models. To determine the separability efficacy of “At Risk” CAD versus Control achieved by the models, Area under Receiver Operating Characteristic (AUROC) metric is used which discriminates the classes by exhibiting tradeoff between the false positive and true positive rates. RESULTS: A total of 2 classifiers were developed, both built using 35 cytokine predictive features. The best AUROC score of .99 with a 95% Confidence Interval (CI) (.982,.999) was achieved by the Random Forest classifier using 35 cytokine biomarkers. The second-best AUROC score of .954 with a 95% Confidence Interval (.929,.979) was achieved by the k-NN model using 35 cytokines. A p-value of less than 7.481e-10 obtained by an independent t-test validated that Random Forest classifier was significantly better than the k-NN classifier with regards to the AUROC score. Presently, as large-scale efforts are gaining momentum to enable early, fast, reliable, affordable, and accessible detection of individuals at risk for CAD, the application of powerful ML algorithms can be leveraged as a supplement to conventional methods such as angiography. Early detection can be further improved by incorporating 65 novel and sensitive cytokine biomarkers. Investigation of the emerging role of cytokines in CAD can materially enhance the detection of risk and the discovery of mechanisms of disease that can lead to new therapeutic modalities.
format	Online Article Text
id	pubmed-8050889
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-80508892021-04-19 Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines Saharan, Seema Singh Nagar, Pankaj Creasy, Kate Townsend Stock, Eveline O. Feng, James Malloy, Mary J. Kane, John P. BioData Min Research BACKGROUND: As per the 2017 WHO fact sheet, Coronary Artery Disease (CAD) is the primary cause of death in the world, and accounts for 31% of total fatalities. The unprecedented 17.6 million deaths caused by CAD in 2016 underscores the urgent need to facilitate proactive and accelerated pre-emptive diagnosis. The innovative and emerging Machine Learning (ML) techniques can be leveraged to facilitate early detection of CAD which is a crucial factor in saving lives. The standard techniques like angiography, that provide reliable evidence are invasive and typically expensive and risky. In contrast, ML model generated diagnosis is non-invasive, fast, accurate and affordable. Therefore, ML algorithms can be used as a supplement or precursor to the conventional methods. This research demonstrates the implementation and comparative analysis of K Nearest Neighbor (k-NN) and Random Forest ML algorithms to achieve a targeted “At Risk” CAD classification using an emerging set of 35 cytokine biomarkers that are strongly indicative predictive variables that can be potential targets for therapy. To ensure better generalizability, mechanisms such as data balancing, repeated k-fold cross validation for hyperparameter tuning, were integrated within the models. To determine the separability efficacy of “At Risk” CAD versus Control achieved by the models, Area under Receiver Operating Characteristic (AUROC) metric is used which discriminates the classes by exhibiting tradeoff between the false positive and true positive rates. RESULTS: A total of 2 classifiers were developed, both built using 35 cytokine predictive features. The best AUROC score of .99 with a 95% Confidence Interval (CI) (.982,.999) was achieved by the Random Forest classifier using 35 cytokine biomarkers. The second-best AUROC score of .954 with a 95% Confidence Interval (.929,.979) was achieved by the k-NN model using 35 cytokines. A p-value of less than 7.481e-10 obtained by an independent t-test validated that Random Forest classifier was significantly better than the k-NN classifier with regards to the AUROC score. Presently, as large-scale efforts are gaining momentum to enable early, fast, reliable, affordable, and accessible detection of individuals at risk for CAD, the application of powerful ML algorithms can be leveraged as a supplement to conventional methods such as angiography. Early detection can be further improved by incorporating 65 novel and sensitive cytokine biomarkers. Investigation of the emerging role of cytokines in CAD can materially enhance the detection of risk and the discovery of mechanisms of disease that can lead to new therapeutic modalities. BioMed Central 2021-04-15 /pmc/articles/PMC8050889/ /pubmed/33858484 http://dx.doi.org/10.1186/s13040-021-00260-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Saharan, Seema Singh Nagar, Pankaj Creasy, Kate Townsend Stock, Eveline O. Feng, James Malloy, Mary J. Kane, John P. Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines
title	Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines
title_full	Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines
title_fullStr	Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines
title_full_unstemmed	Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines
title_short	Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines
title_sort	machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8050889/ https://www.ncbi.nlm.nih.gov/pubmed/33858484 http://dx.doi.org/10.1186/s13040-021-00260-z
work_keys_str_mv	AT saharanseemasingh machinelearningandstatisticalapproachesforclassificationofriskofcoronaryarterydiseaseusingplasmacytokines AT nagarpankaj machinelearningandstatisticalapproachesforclassificationofriskofcoronaryarterydiseaseusingplasmacytokines AT creasykatetownsend machinelearningandstatisticalapproachesforclassificationofriskofcoronaryarterydiseaseusingplasmacytokines AT stockevelineo machinelearningandstatisticalapproachesforclassificationofriskofcoronaryarterydiseaseusingplasmacytokines AT fengjames machinelearningandstatisticalapproachesforclassificationofriskofcoronaryarterydiseaseusingplasmacytokines AT malloymaryj machinelearningandstatisticalapproachesforclassificationofriskofcoronaryarterydiseaseusingplasmacytokines AT kanejohnp machinelearningandstatisticalapproachesforclassificationofriskofcoronaryarterydiseaseusingplasmacytokines

Machine learning and statistical approaches for classification of risk of coronary artery disease using plasma cytokines

Ejemplares similares