Cargando…

Diagnostics for Statistical Variable Selection Methods for Prediction of Peptic Ulcer Disease in Helicobacter pylori Infection

BACKGROUND: The development of accurate classification models depends upon the methods used to identify the most relevant variables. The aim of this article is to evaluate variable selection methods to identify important variables in predicting a binary response using nonlinear statistical models. O...

Descripción completa

Detalles Bibliográficos
Autores principales: Ju, Hyunsu, Brasier, Allan R, Kurosky, Alexander, Xu, Bo, Reyes, Victor E, Graham, David Y
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4132894/
https://www.ncbi.nlm.nih.gov/pubmed/25132738
http://dx.doi.org/10.4172/jpb.1000308
_version_ 1782330680375508992
author Ju, Hyunsu
Brasier, Allan R
Kurosky, Alexander
Xu, Bo
Reyes, Victor E
Graham, David Y
author_facet Ju, Hyunsu
Brasier, Allan R
Kurosky, Alexander
Xu, Bo
Reyes, Victor E
Graham, David Y
author_sort Ju, Hyunsu
collection PubMed
description BACKGROUND: The development of accurate classification models depends upon the methods used to identify the most relevant variables. The aim of this article is to evaluate variable selection methods to identify important variables in predicting a binary response using nonlinear statistical models. Our goals in model selection include producing non-overfitting stable models that are interpretable, that generate accurate predictions and have minimum bias. This work was motivated by data on clinical and laboratory features of Helicobacter pylori infections obtained from 60 individuals enrolled in a prospective observational study. RESULTS: We carried out a comprehensive performance comparison of several nonlinear classification models over the H. pylori data set. We compared variable selection results by Multivariate Adaptive Regression Splines (MARS), Logistic Regression with regularization, Generalized Additive Models (GAMs) and Bayesian Variable Selection in GAMs. We found that the MARS model approach has the highest predictive power because the nonlinearity assumptions of candidate predictors are strongly satisfied, a finding demonstrated via deviance chi-square testing procedures in GAMs. CONCLUSIONS: Our results suggest that the physiological free amino acids citrulline, histidine, lysine and arginine are the major features for predicting H. pylori peptic ulcer disease on the basis of amino acid profiling.
format Online
Article
Text
id pubmed-4132894
institution National Center for Biotechnology Information
language English
publishDate 2014
record_format MEDLINE/PubMed
spelling pubmed-41328942014-08-14 Diagnostics for Statistical Variable Selection Methods for Prediction of Peptic Ulcer Disease in Helicobacter pylori Infection Ju, Hyunsu Brasier, Allan R Kurosky, Alexander Xu, Bo Reyes, Victor E Graham, David Y J Proteomics Bioinform Article BACKGROUND: The development of accurate classification models depends upon the methods used to identify the most relevant variables. The aim of this article is to evaluate variable selection methods to identify important variables in predicting a binary response using nonlinear statistical models. Our goals in model selection include producing non-overfitting stable models that are interpretable, that generate accurate predictions and have minimum bias. This work was motivated by data on clinical and laboratory features of Helicobacter pylori infections obtained from 60 individuals enrolled in a prospective observational study. RESULTS: We carried out a comprehensive performance comparison of several nonlinear classification models over the H. pylori data set. We compared variable selection results by Multivariate Adaptive Regression Splines (MARS), Logistic Regression with regularization, Generalized Additive Models (GAMs) and Bayesian Variable Selection in GAMs. We found that the MARS model approach has the highest predictive power because the nonlinearity assumptions of candidate predictors are strongly satisfied, a finding demonstrated via deviance chi-square testing procedures in GAMs. CONCLUSIONS: Our results suggest that the physiological free amino acids citrulline, histidine, lysine and arginine are the major features for predicting H. pylori peptic ulcer disease on the basis of amino acid profiling. 2014-03-28 2014-04-01 /pmc/articles/PMC4132894/ /pubmed/25132738 http://dx.doi.org/10.4172/jpb.1000308 Text en Copyright: © 2014 Ju H, et al. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Article
Ju, Hyunsu
Brasier, Allan R
Kurosky, Alexander
Xu, Bo
Reyes, Victor E
Graham, David Y
Diagnostics for Statistical Variable Selection Methods for Prediction of Peptic Ulcer Disease in Helicobacter pylori Infection
title Diagnostics for Statistical Variable Selection Methods for Prediction of Peptic Ulcer Disease in Helicobacter pylori Infection
title_full Diagnostics for Statistical Variable Selection Methods for Prediction of Peptic Ulcer Disease in Helicobacter pylori Infection
title_fullStr Diagnostics for Statistical Variable Selection Methods for Prediction of Peptic Ulcer Disease in Helicobacter pylori Infection
title_full_unstemmed Diagnostics for Statistical Variable Selection Methods for Prediction of Peptic Ulcer Disease in Helicobacter pylori Infection
title_short Diagnostics for Statistical Variable Selection Methods for Prediction of Peptic Ulcer Disease in Helicobacter pylori Infection
title_sort diagnostics for statistical variable selection methods for prediction of peptic ulcer disease in helicobacter pylori infection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4132894/
https://www.ncbi.nlm.nih.gov/pubmed/25132738
http://dx.doi.org/10.4172/jpb.1000308
work_keys_str_mv AT juhyunsu diagnosticsforstatisticalvariableselectionmethodsforpredictionofpepticulcerdiseaseinhelicobacterpyloriinfection
AT brasierallanr diagnosticsforstatisticalvariableselectionmethodsforpredictionofpepticulcerdiseaseinhelicobacterpyloriinfection
AT kuroskyalexander diagnosticsforstatisticalvariableselectionmethodsforpredictionofpepticulcerdiseaseinhelicobacterpyloriinfection
AT xubo diagnosticsforstatisticalvariableselectionmethodsforpredictionofpepticulcerdiseaseinhelicobacterpyloriinfection
AT reyesvictore diagnosticsforstatisticalvariableselectionmethodsforpredictionofpepticulcerdiseaseinhelicobacterpyloriinfection
AT grahamdavidy diagnosticsforstatisticalvariableselectionmethodsforpredictionofpepticulcerdiseaseinhelicobacterpyloriinfection