Cargando…

Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification

While colorectal cancer (CRC) is third in prevalence and mortality among cancers in the United States, there is no effective method to screen the general public for CRC risk. In this study, to identify an effective mass screening method for CRC risk, we evaluated seven supervised machine learning al...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nartowt, Bradley J., Hart, Gregory R., Muhammad, Wazir, Liang, Ying, Stark, Gigi F., Deng, Jun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Big Data
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931964/ https://www.ncbi.nlm.nih.gov/pubmed/33693381 http://dx.doi.org/10.3389/fdata.2020.00006

_version_	1783660393274540032
author	Nartowt, Bradley J. Hart, Gregory R. Muhammad, Wazir Liang, Ying Stark, Gigi F. Deng, Jun
author_facet	Nartowt, Bradley J. Hart, Gregory R. Muhammad, Wazir Liang, Ying Stark, Gigi F. Deng, Jun
author_sort	Nartowt, Bradley J.
collection	PubMed
description	While colorectal cancer (CRC) is third in prevalence and mortality among cancers in the United States, there is no effective method to screen the general public for CRC risk. In this study, to identify an effective mass screening method for CRC risk, we evaluated seven supervised machine learning algorithms: linear discriminant analysis, support vector machine, naive Bayes, decision tree, random forest, logistic regression, and artificial neural network. Models were trained and cross-tested with the National Health Interview Survey (NHIS) and the Prostate, Lung, Colorectal, Ovarian Cancer Screening (PLCO) datasets. Six imputation methods were used to handle missing data: mean, Gaussian, Lorentzian, one-hot encoding, Gaussian expectation-maximization, and listwise deletion. Among all of the model configurations and imputation method combinations, the artificial neural network with expectation-maximization imputation emerged as the best, having a concordance of 0.70 ± 0.02, sensitivity of 0.63 ± 0.06, and specificity of 0.82 ± 0.04. In stratifying CRC risk in the NHIS and PLCO datasets, only 2% of negative cases were misclassified as high risk and 6% of positive cases were misclassified as low risk. In modeling the CRC-free probability with Kaplan-Meier estimators, low-, medium-, and high CRC-risk groups have statistically-significant separation. Our results indicated that the trained artificial neural network can be used as an effective screening tool for early intervention and prevention of CRC in large populations.
format	Online Article Text
id	pubmed-7931964
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-79319642021-03-09 Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification Nartowt, Bradley J. Hart, Gregory R. Muhammad, Wazir Liang, Ying Stark, Gigi F. Deng, Jun Front Big Data Big Data While colorectal cancer (CRC) is third in prevalence and mortality among cancers in the United States, there is no effective method to screen the general public for CRC risk. In this study, to identify an effective mass screening method for CRC risk, we evaluated seven supervised machine learning algorithms: linear discriminant analysis, support vector machine, naive Bayes, decision tree, random forest, logistic regression, and artificial neural network. Models were trained and cross-tested with the National Health Interview Survey (NHIS) and the Prostate, Lung, Colorectal, Ovarian Cancer Screening (PLCO) datasets. Six imputation methods were used to handle missing data: mean, Gaussian, Lorentzian, one-hot encoding, Gaussian expectation-maximization, and listwise deletion. Among all of the model configurations and imputation method combinations, the artificial neural network with expectation-maximization imputation emerged as the best, having a concordance of 0.70 ± 0.02, sensitivity of 0.63 ± 0.06, and specificity of 0.82 ± 0.04. In stratifying CRC risk in the NHIS and PLCO datasets, only 2% of negative cases were misclassified as high risk and 6% of positive cases were misclassified as low risk. In modeling the CRC-free probability with Kaplan-Meier estimators, low-, medium-, and high CRC-risk groups have statistically-significant separation. Our results indicated that the trained artificial neural network can be used as an effective screening tool for early intervention and prevention of CRC in large populations. Frontiers Media S.A. 2020-03-10 /pmc/articles/PMC7931964/ /pubmed/33693381 http://dx.doi.org/10.3389/fdata.2020.00006 Text en Copyright © 2020 Nartowt, Hart, Muhammad, Liang, Stark and Deng. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Big Data Nartowt, Bradley J. Hart, Gregory R. Muhammad, Wazir Liang, Ying Stark, Gigi F. Deng, Jun Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification
title	Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification
title_full	Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification
title_fullStr	Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification
title_full_unstemmed	Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification
title_short	Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification
title_sort	robust machine learning for colorectal cancer risk prediction and stratification
topic	Big Data
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7931964/ https://www.ncbi.nlm.nih.gov/pubmed/33693381 http://dx.doi.org/10.3389/fdata.2020.00006
work_keys_str_mv	AT nartowtbradleyj robustmachinelearningforcolorectalcancerriskpredictionandstratification AT hartgregoryr robustmachinelearningforcolorectalcancerriskpredictionandstratification AT muhammadwazir robustmachinelearningforcolorectalcancerriskpredictionandstratification AT liangying robustmachinelearningforcolorectalcancerriskpredictionandstratification AT starkgigif robustmachinelearningforcolorectalcancerriskpredictionandstratification AT dengjun robustmachinelearningforcolorectalcancerriskpredictionandstratification

Robust Machine Learning for Colorectal Cancer Risk Prediction and Stratification

Ejemplares similares