Cargando…

Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database

BACKGROUND: Machine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization, or prediction. ML techniques have been traditionally applied to large, highly dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditiona...

Descripción completa

Detalles Bibliográficos
Autores principales:	Panesar, Sandip S., D'Souza, Rhett N., Yeh, Fang-Cheng, Fernandez-Miranda, Juan C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2019
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581022/ https://www.ncbi.nlm.nih.gov/pubmed/31218287 http://dx.doi.org/10.1016/j.wnsx.2019.100012

_version_	1783428120970264576
author	Panesar, Sandip S. D'Souza, Rhett N. Yeh, Fang-Cheng Fernandez-Miranda, Juan C.
author_facet	Panesar, Sandip S. D'Souza, Rhett N. Yeh, Fang-Cheng Fernandez-Miranda, Juan C.
author_sort	Panesar, Sandip S.
collection	PubMed
description	BACKGROUND: Machine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization, or prediction. ML techniques have been traditionally applied to large, highly dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathologic features. Recently, the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly dimensional database of patients with glioma. METHODS: We applied 3 ML techniques (artificial neural networks [ANNs], decision trees [DTs], and support vector machines [SVMs]) and classical logistic regression (LR) to a dataset consisting of 76 patients with glioma of all grades. We compared the effect of applying the algorithms to the raw database versus a database where only statistically significant features were included into the algorithmic inputs (feature selection). RESULTS: Raw input consisted of 21 variables and achieved performance of accuracy/area (C.I.) under the curve of 70.7%/0.70 (49.9–88.5) for ANN, 68%/0.72 (53.4–90.4) for SVM, 66.7%/0.64 (43.6–85.0) for LR, and 65%/0.70 (51.6–89.5) for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 (62.9–87.9) for ANN, 73.3%/0.74 (62.1–87.4) for SVM, 69.3%/0.73 (60.0–85.8) for LR, and 65.2%/0.63 (49.1–76.9) for DT. CONCLUSIONS: We demonstrate that these techniques can also be applied to small, highly dimensional datasets. Our ML techniques achieved reasonable performance compared with similar studies in the literature. Although local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis; however, traditional statistical methods are of similar benefit.
format	Online Article Text
id	pubmed-6581022
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-65810222019-06-19 Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database Panesar, Sandip S. D'Souza, Rhett N. Yeh, Fang-Cheng Fernandez-Miranda, Juan C. World Neurosurg X Original Article BACKGROUND: Machine learning (ML) is the application of specialized algorithms to datasets for trend delineation, categorization, or prediction. ML techniques have been traditionally applied to large, highly dimensional databases. Gliomas are a heterogeneous group of primary brain tumors, traditionally graded using histopathologic features. Recently, the World Health Organization proposed a novel grading system for gliomas incorporating molecular characteristics. We aimed to study whether ML could achieve accurate prognostication of 2-year mortality in a small, highly dimensional database of patients with glioma. METHODS: We applied 3 ML techniques (artificial neural networks [ANNs], decision trees [DTs], and support vector machines [SVMs]) and classical logistic regression (LR) to a dataset consisting of 76 patients with glioma of all grades. We compared the effect of applying the algorithms to the raw database versus a database where only statistically significant features were included into the algorithmic inputs (feature selection). RESULTS: Raw input consisted of 21 variables and achieved performance of accuracy/area (C.I.) under the curve of 70.7%/0.70 (49.9–88.5) for ANN, 68%/0.72 (53.4–90.4) for SVM, 66.7%/0.64 (43.6–85.0) for LR, and 65%/0.70 (51.6–89.5) for DT. Feature selected input consisted of 14 variables and achieved performance of 73.4%/0.75 (62.9–87.9) for ANN, 73.3%/0.74 (62.1–87.4) for SVM, 69.3%/0.73 (60.0–85.8) for LR, and 65.2%/0.63 (49.1–76.9) for DT. CONCLUSIONS: We demonstrate that these techniques can also be applied to small, highly dimensional datasets. Our ML techniques achieved reasonable performance compared with similar studies in the literature. Although local databases may be small versus larger cancer repositories, we demonstrate that ML techniques can still be applied to their analysis; however, traditional statistical methods are of similar benefit. Elsevier 2019-01-24 /pmc/articles/PMC6581022/ /pubmed/31218287 http://dx.doi.org/10.1016/j.wnsx.2019.100012 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Original Article Panesar, Sandip S. D'Souza, Rhett N. Yeh, Fang-Cheng Fernandez-Miranda, Juan C. Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title	Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_full	Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_fullStr	Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_full_unstemmed	Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_short	Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database
title_sort	machine learning versus logistic regression methods for 2-year mortality prognostication in a small, heterogeneous glioma database
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581022/ https://www.ncbi.nlm.nih.gov/pubmed/31218287 http://dx.doi.org/10.1016/j.wnsx.2019.100012
work_keys_str_mv	AT panesarsandips machinelearningversuslogisticregressionmethodsfor2yearmortalityprognosticationinasmallheterogeneousgliomadatabase AT dsouzarhettn machinelearningversuslogisticregressionmethodsfor2yearmortalityprognosticationinasmallheterogeneousgliomadatabase AT yehfangcheng machinelearningversuslogisticregressionmethodsfor2yearmortalityprognosticationinasmallheterogeneousgliomadatabase AT fernandezmirandajuanc machinelearningversuslogisticregressionmethodsfor2yearmortalityprognosticationinasmallheterogeneousgliomadatabase

Machine Learning Versus Logistic Regression Methods for 2-Year Mortality Prognostication in a Small, Heterogeneous Glioma Database

Ejemplares similares