Cargando…

A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case

Linear regression (LR) is a core model in supervised machine learning performing a regression task. One can fit this model using either an analytic/closed-form formula or an iterative algorithm. Fitting it via the analytic formula becomes a problem when the number of predictors is greater than the n...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ciobanu, Sebastian, Ciortuz, Liviu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8394575/ https://www.ncbi.nlm.nih.gov/pubmed/34441152 http://dx.doi.org/10.3390/e23081012

_version_	1783743979892768768
author	Ciobanu, Sebastian Ciortuz, Liviu
author_facet	Ciobanu, Sebastian Ciortuz, Liviu
author_sort	Ciobanu, Sebastian
collection	PubMed
description	Linear regression (LR) is a core model in supervised machine learning performing a regression task. One can fit this model using either an analytic/closed-form formula or an iterative algorithm. Fitting it via the analytic formula becomes a problem when the number of predictors is greater than the number of samples because the closed-form solution contains a matrix inverse that is not defined when having more predictors than samples. The standard approach to solve this issue is using the Moore–Penrose inverse or the L2 regularization. We propose another solution starting from a machine learning model that, this time, is used in unsupervised learning performing a dimensionality reduction task or just a density estimation one—factor analysis (FA)—with one-dimensional latent space. The density estimation task represents our focus since, in this case, it can fit a Gaussian distribution even if the dimensionality of the data is greater than the number of samples; hence, we obtain this advantage when creating the supervised counterpart of factor analysis, which is linked to linear regression. We also create its semisupervised counterpart and then extend it to be usable with missing data. We prove an equivalence to linear regression and create experiments for each extension of the factor analysis model. The resulting algorithms are either a closed-form solution or an expectation–maximization (EM) algorithm. The latter is linked to information theory by optimizing a function containing a Kullback–Leibler (KL) divergence or the entropy of a random variable.
format	Online Article Text
id	pubmed-8394575
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-83945752021-08-28 A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case Ciobanu, Sebastian Ciortuz, Liviu Entropy (Basel) Article Linear regression (LR) is a core model in supervised machine learning performing a regression task. One can fit this model using either an analytic/closed-form formula or an iterative algorithm. Fitting it via the analytic formula becomes a problem when the number of predictors is greater than the number of samples because the closed-form solution contains a matrix inverse that is not defined when having more predictors than samples. The standard approach to solve this issue is using the Moore–Penrose inverse or the L2 regularization. We propose another solution starting from a machine learning model that, this time, is used in unsupervised learning performing a dimensionality reduction task or just a density estimation one—factor analysis (FA)—with one-dimensional latent space. The density estimation task represents our focus since, in this case, it can fit a Gaussian distribution even if the dimensionality of the data is greater than the number of samples; hence, we obtain this advantage when creating the supervised counterpart of factor analysis, which is linked to linear regression. We also create its semisupervised counterpart and then extend it to be usable with missing data. We prove an equivalence to linear regression and create experiments for each extension of the factor analysis model. The resulting algorithms are either a closed-form solution or an expectation–maximization (EM) algorithm. The latter is linked to information theory by optimizing a function containing a Kullback–Leibler (KL) divergence or the entropy of a random variable. MDPI 2021-08-03 /pmc/articles/PMC8394575/ /pubmed/34441152 http://dx.doi.org/10.3390/e23081012 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Ciobanu, Sebastian Ciortuz, Liviu A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case
title	A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case
title_full	A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case
title_fullStr	A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case
title_full_unstemmed	A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case
title_short	A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case
title_sort	factor analysis perspective on linear regression in the ‘more predictors than samples’ case
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8394575/ https://www.ncbi.nlm.nih.gov/pubmed/34441152 http://dx.doi.org/10.3390/e23081012
work_keys_str_mv	AT ciobanusebastian afactoranalysisperspectiveonlinearregressioninthemorepredictorsthansamplescase AT ciortuzliviu afactoranalysisperspectiveonlinearregressioninthemorepredictorsthansamplescase AT ciobanusebastian factoranalysisperspectiveonlinearregressioninthemorepredictorsthansamplescase AT ciortuzliviu factoranalysisperspectiveonlinearregressioninthemorepredictorsthansamplescase

A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case

Ejemplares similares