Cargando…

PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data

Non-targeted metabolomics constitutes a part of the systems biology and aims at determining numerous metabolites in complex biological samples. Datasets obtained in the non-targeted metabolomics studies are high-dimensional due to sensitivity of mass spectrometry-based detection methods as well as c...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bujak, Renata, Daghir-Wojtkowiak, Emilia, Kaliszan, Roman, Markuszewski, Michał J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2016
Materias:	Molecular Biosciences
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4960252/ https://www.ncbi.nlm.nih.gov/pubmed/27508208 http://dx.doi.org/10.3389/fmolb.2016.00035

_version_	1782444494584545280
author	Bujak, Renata Daghir-Wojtkowiak, Emilia Kaliszan, Roman Markuszewski, Michał J.
author_facet	Bujak, Renata Daghir-Wojtkowiak, Emilia Kaliszan, Roman Markuszewski, Michał J.
author_sort	Bujak, Renata
collection	PubMed
description	Non-targeted metabolomics constitutes a part of the systems biology and aims at determining numerous metabolites in complex biological samples. Datasets obtained in the non-targeted metabolomics studies are high-dimensional due to sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Therefore, a proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study). The orthogonal projections to latent structures-discriminant analysis (OPLS-DA) without and with multiple testing correction as well as the least absolute shrinkage and selection operator (LASSO) with bootstrapping, were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction selected 46 and 218 variables based on the VIP criteria using Pareto and UV scaling, respectively. For the PH study, 217 and 320 variables were selected based on the VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built after correcting for multiple testing, selected 4 and 19 variables as in terms of Pareto and UV scaling, respectively. For the PH study, 14 and 18 variables were selected based on the VIP criteria in terms of Pareto and UV scaling, respectively. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3 and 100%, respectively. In the light of PLS-based models, the larger the search space the higher the probability of developing models that fit the training data well with simultaneous poor predictive performance on the validation set. The LASSO offers potential improvements over standard linear regression due to the presence of the constrain, which promotes sparse solutions. This paper is the first one to date utilizing the LASSO penalized logistic regression in untargeted metabolomics studies.
format	Online Article Text
id	pubmed-4960252
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-49602522016-08-09 PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data Bujak, Renata Daghir-Wojtkowiak, Emilia Kaliszan, Roman Markuszewski, Michał J. Front Mol Biosci Molecular Biosciences Non-targeted metabolomics constitutes a part of the systems biology and aims at determining numerous metabolites in complex biological samples. Datasets obtained in the non-targeted metabolomics studies are high-dimensional due to sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Therefore, a proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study). The orthogonal projections to latent structures-discriminant analysis (OPLS-DA) without and with multiple testing correction as well as the least absolute shrinkage and selection operator (LASSO) with bootstrapping, were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction selected 46 and 218 variables based on the VIP criteria using Pareto and UV scaling, respectively. For the PH study, 217 and 320 variables were selected based on the VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built after correcting for multiple testing, selected 4 and 19 variables as in terms of Pareto and UV scaling, respectively. For the PH study, 14 and 18 variables were selected based on the VIP criteria in terms of Pareto and UV scaling, respectively. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3 and 100%, respectively. In the light of PLS-based models, the larger the search space the higher the probability of developing models that fit the training data well with simultaneous poor predictive performance on the validation set. The LASSO offers potential improvements over standard linear regression due to the presence of the constrain, which promotes sparse solutions. This paper is the first one to date utilizing the LASSO penalized logistic regression in untargeted metabolomics studies. Frontiers Media S.A. 2016-07-26 /pmc/articles/PMC4960252/ /pubmed/27508208 http://dx.doi.org/10.3389/fmolb.2016.00035 Text en Copyright © 2016 Bujak, Daghir-Wojtkowiak, Kaliszan and Markuszewski. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Molecular Biosciences Bujak, Renata Daghir-Wojtkowiak, Emilia Kaliszan, Roman Markuszewski, Michał J. PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data
title	PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data
title_full	PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data
title_fullStr	PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data
title_full_unstemmed	PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data
title_short	PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data
title_sort	pls-based and regularization-based methods for the selection of relevant variables in non-targeted metabolomics data
topic	Molecular Biosciences
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4960252/ https://www.ncbi.nlm.nih.gov/pubmed/27508208 http://dx.doi.org/10.3389/fmolb.2016.00035
work_keys_str_mv	AT bujakrenata plsbasedandregularizationbasedmethodsfortheselectionofrelevantvariablesinnontargetedmetabolomicsdata AT daghirwojtkowiakemilia plsbasedandregularizationbasedmethodsfortheselectionofrelevantvariablesinnontargetedmetabolomicsdata AT kaliszanroman plsbasedandregularizationbasedmethodsfortheselectionofrelevantvariablesinnontargetedmetabolomicsdata AT markuszewskimichałj plsbasedandregularizationbasedmethodsfortheselectionofrelevantvariablesinnontargetedmetabolomicsdata

PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data

Ejemplares similares