Cargando…

Missing value imputation in a data matrix using the regularised singular value decomposition

Some statistical analysis techniques may require complete data matrices, but a frequent problem in the construction of databases is the incomplete collection of information for different reasons. One option to tackle the problem is to estimate and impute the missing data. This paper describes a form...

Descripción completa

Detalles Bibliográficos
Autores principales:	Arciniegas-Alarcón, Sergio, García-Peña, Marisol, Krzanowski, Wojtek J., Rengifo, Camilo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2023
Materias:	Agricultural and Biological Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10407287/ https://www.ncbi.nlm.nih.gov/pubmed/37560402 http://dx.doi.org/10.1016/j.mex.2023.102289

_version_	1785085924990779392
author	Arciniegas-Alarcón, Sergio García-Peña, Marisol Krzanowski, Wojtek J. Rengifo, Camilo
author_facet	Arciniegas-Alarcón, Sergio García-Peña, Marisol Krzanowski, Wojtek J. Rengifo, Camilo
author_sort	Arciniegas-Alarcón, Sergio
collection	PubMed
description	Some statistical analysis techniques may require complete data matrices, but a frequent problem in the construction of databases is the incomplete collection of information for different reasons. One option to tackle the problem is to estimate and impute the missing data. This paper describes a form of imputation that mixes regression with lower rank approximations. To improve the quality of the imputations, a generalisation is proposed that replaces the singular value decomposition (SVD) of the matrix with a regularised SVD in which the regularisation parameter is estimated by cross-validation. To evaluate the performance of the proposal, ten sets of real data from multienvironment trials were used. Missing values were created in each set at four percentages of missing not at random, and three criteria were then considered to investigate the effectiveness of the proposal. The results show that the regularised method proves very competitive when compared to the original method, beating it in several of the considered scenarios. As it is a very general system, its application can be extended to all multivariate data matrices. • The imputation method is modified through the inclusion of a stable and efficient computational algorithm that replaces the classical SVD least squares criterion by a penalised criterion. This penalty produces smoothed eigenvectors and eigenvalues that avoid overfitting problems, improving the performance of the method when the penalty is necessary. The size of the penalty can be determined by minimising one of the following criteria: the prediction errors, the Procrustes similarity statistic or the critical angles between subspaces of principal components.
format	Online Article Text
id	pubmed-10407287
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-104072872023-08-09 Missing value imputation in a data matrix using the regularised singular value decomposition Arciniegas-Alarcón, Sergio García-Peña, Marisol Krzanowski, Wojtek J. Rengifo, Camilo MethodsX Agricultural and Biological Science Some statistical analysis techniques may require complete data matrices, but a frequent problem in the construction of databases is the incomplete collection of information for different reasons. One option to tackle the problem is to estimate and impute the missing data. This paper describes a form of imputation that mixes regression with lower rank approximations. To improve the quality of the imputations, a generalisation is proposed that replaces the singular value decomposition (SVD) of the matrix with a regularised SVD in which the regularisation parameter is estimated by cross-validation. To evaluate the performance of the proposal, ten sets of real data from multienvironment trials were used. Missing values were created in each set at four percentages of missing not at random, and three criteria were then considered to investigate the effectiveness of the proposal. The results show that the regularised method proves very competitive when compared to the original method, beating it in several of the considered scenarios. As it is a very general system, its application can be extended to all multivariate data matrices. • The imputation method is modified through the inclusion of a stable and efficient computational algorithm that replaces the classical SVD least squares criterion by a penalised criterion. This penalty produces smoothed eigenvectors and eigenvalues that avoid overfitting problems, improving the performance of the method when the penalty is necessary. The size of the penalty can be determined by minimising one of the following criteria: the prediction errors, the Procrustes similarity statistic or the critical angles between subspaces of principal components. Elsevier 2023-07-17 /pmc/articles/PMC10407287/ /pubmed/37560402 http://dx.doi.org/10.1016/j.mex.2023.102289 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Agricultural and Biological Science Arciniegas-Alarcón, Sergio García-Peña, Marisol Krzanowski, Wojtek J. Rengifo, Camilo Missing value imputation in a data matrix using the regularised singular value decomposition
title	Missing value imputation in a data matrix using the regularised singular value decomposition
title_full	Missing value imputation in a data matrix using the regularised singular value decomposition
title_fullStr	Missing value imputation in a data matrix using the regularised singular value decomposition
title_full_unstemmed	Missing value imputation in a data matrix using the regularised singular value decomposition
title_short	Missing value imputation in a data matrix using the regularised singular value decomposition
title_sort	missing value imputation in a data matrix using the regularised singular value decomposition
topic	Agricultural and Biological Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10407287/ https://www.ncbi.nlm.nih.gov/pubmed/37560402 http://dx.doi.org/10.1016/j.mex.2023.102289
work_keys_str_mv	AT arciniegasalarconsergio missingvalueimputationinadatamatrixusingtheregularisedsingularvaluedecomposition AT garciapenamarisol missingvalueimputationinadatamatrixusingtheregularisedsingularvaluedecomposition AT krzanowskiwojtekj missingvalueimputationinadatamatrixusingtheregularisedsingularvaluedecomposition AT rengifocamilo missingvalueimputationinadatamatrixusingtheregularisedsingularvaluedecomposition

Missing value imputation in a data matrix using the regularised singular value decomposition

Ejemplares similares