Cargando…

Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database

BACKGROUND: Machine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization, or prediction. ML techniques have been traditionally applied to large, highly dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditiona...

Descripción completa

Detalles Bibliográficos
Autores principales: Panesar, Sandip S., D'Souza, Rhett N., Yeh, Fang-Cheng, Fernandez-Miranda, Juan C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581022/
https://www.ncbi.nlm.nih.gov/pubmed/31218287
http://dx.doi.org/10.1016/j.wnsx.2019.100012
_version_ 1783428120970264576
author Panesar, Sandip S.
D'Souza, Rhett N.
Yeh, Fang-Cheng
Fernandez-Miranda, Juan C.
author_facet Panesar, Sandip S.
D'Souza, Rhett N.
Yeh, Fang-Cheng
Fernandez-Miranda, Juan C.
author_sort Panesar, Sandip S.
collection PubMed
description BACKGROUND: Machine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization, or prediction. ML techniques have been traditionally applied to large, highly dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathologic features. Recently, the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly dimensional database of patients with glioma. METHODS: We applied 3 ML techniques (artificial neural networks [ANNs], decision trees [DTs], and support vector machines [SVMs]) and classical logistic regression (LR) to a dataset consisting of 76 patients with glioma of all grades. We compared the effect of applying the algorithms to the raw database versus a database where only statistically significant features were included into the algorithmic inputs (feature selection). RESULTS: Raw input consisted of 21 variables and achieved performance of accuracy/area (C.I.) under the curve of 70.7%/0.70 (49.9–88.5) for ANN, 68%/0.72 (53.4–90.4) for SVM, 66.7%/0.64 (43.6–85.0) for LR, and 65%/0.70 (51.6–89.5) for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 (62.9–87.9) for ANN, 73.3%/0.74 (62.1–87.4) for SVM, 69.3%/0.73 (60.0–85.8) for LR, and 65.2%/0.63 (49.1–76.9) for DT. CONCLUSIONS: We demonstrate that these techniques can also be applied to small, highly dimensional datasets. Our ML techniques achieved reasonable performance compared with similar studies in the literature. Although local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis; however, traditional statistical methods are of similar benefit.
format Online
Article
Text
id pubmed-6581022
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-65810222019-06-19 Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database Panesar, Sandip S. D'Souza, Rhett N. Yeh, Fang-Cheng Fernandez-Miranda, Juan C. World Neurosurg X Original Article BACKGROUND: Machine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization, or prediction. ML techniques have been traditionally applied to large, highly dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathologic features. Recently, the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly dimensional database of patients with glioma. METHODS: We applied 3 ML techniques (artificial neural networks [ANNs], decision trees [DTs], and support vector machines [SVMs]) and classical logistic regression (LR) to a dataset consisting of 76 patients with glioma of all grades. We compared the effect of applying the algorithms to the raw database versus a database where only statistically significant features were included into the algorithmic inputs (feature selection). RESULTS: Raw input consisted of 21 variables and achieved performance of accuracy/area (C.I.) under the curve of 70.7%/0.70 (49.9–88.5) for ANN, 68%/0.72 (53.4–90.4) for SVM, 66.7%/0.64 (43.6–85.0) for LR, and 65%/0.70 (51.6–89.5) for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 (62.9–87.9) for ANN, 73.3%/0.74 (62.1–87.4) for SVM, 69.3%/0.73 (60.0–85.8) for LR, and 65.2%/0.63 (49.1–76.9) for DT. CONCLUSIONS: We demonstrate that these techniques can also be applied to small, highly dimensional datasets. Our ML techniques achieved reasonable performance compared with similar studies in the literature. Although local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis; however, traditional statistical methods are of similar benefit. Elsevier 2019-01-24 /pmc/articles/PMC6581022/ /pubmed/31218287 http://dx.doi.org/10.1016/j.wnsx.2019.100012 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Article
Panesar, Sandip S.
D'Souza, Rhett N.
Yeh, Fang-Cheng
Fernandez-Miranda, Juan C.
Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_full Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_fullStr Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_full_unstemmed Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_short Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_sort machine learning versus logistic regression methods for 2-year mortality prognostication in a small, heterogeneous glioma database
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581022/
https://www.ncbi.nlm.nih.gov/pubmed/31218287
http://dx.doi.org/10.1016/j.wnsx.2019.100012
work_keys_str_mv AT panesarsandips machinelearningversuslogisticregressionmethodsfor2yearmortalityprognosticationinasmallheterogeneousgliomadatabase
AT dsouzarhettn machinelearningversuslogisticregressionmethodsfor2yearmortalityprognosticationinasmallheterogeneousgliomadatabase
AT yehfangcheng machinelearningversuslogisticregressionmethodsfor2yearmortalityprognosticationinasmallheterogeneousgliomadatabase
AT fernandezmirandajuanc machinelearningversuslogisticregressionmethodsfor2yearmortalityprognosticationinasmallheterogeneousgliomadatabase