Cargando…

rCUR: an R package for CUR matrix decomposition

BACKGROUND: Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite u...

Descripción completa

Detalles Bibliográficos
Autores principales: Bodor, András, Csabai, István, Mahoney, Michael W, Solymosi, Norbert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3546429/
https://www.ncbi.nlm.nih.gov/pubmed/22594948
http://dx.doi.org/10.1186/1471-2105-13-103
_version_ 1782256053990195200
author Bodor, András
Csabai, István
Mahoney, Michael W
Solymosi, Norbert
author_facet Bodor, András
Csabai, István
Mahoney, Michael W
Solymosi, Norbert
author_sort Bodor, András
collection PubMed
description BACKGROUND: Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite useful as a tool to summarize the data, they are linear combinations of up to all of the data points, and thus it is typically quite hard to interpret those vectors in terms of the application domain from which the data are drawn. Recently, an alternative dimensionality reduction paradigm, CUR matrix decompositions, has been proposed to address this problem and has been applied to genetic and internet data. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Since they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn. RESULTS: We present an implementation to perform CUR matrix decompositions, in the form of a freely available, open source R-package called rCUR. This package will help users to perform CUR-based analysis on large-scale data, such as those obtained from different high-throughput technologies, in an interactive and exploratory manner. We show two examples that illustrate how CUR-based techniques make it possible to reduce significantly the number of probes, while at the same time maintaining major trends in data and keeping the same classification accuracy. CONCLUSIONS: The package rCUR provides functions for the users to perform CUR-based matrix decompositions in the R environment. In gene expression studies, it gives an additional way of analysis of differential expression and discriminant gene selection based on the use of statistical leverage scores. These scores, which have been used historically in diagnostic regression analysis to identify outliers, can be used by rCUR to identify the most informative data points with respect to which to express the remaining data points.
format Online
Article
Text
id pubmed-3546429
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35464292013-01-17 rCUR: an R package for CUR matrix decomposition Bodor, András Csabai, István Mahoney, Michael W Solymosi, Norbert BMC Bioinformatics Software BACKGROUND: Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite useful as a tool to summarize the data, they are linear combinations of up to all of the data points, and thus it is typically quite hard to interpret those vectors in terms of the application domain from which the data are drawn. Recently, an alternative dimensionality reduction paradigm, CUR matrix decompositions, has been proposed to address this problem and has been applied to genetic and internet data. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Since they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn. RESULTS: We present an implementation to perform CUR matrix decompositions, in the form of a freely available, open source R-package called rCUR. This package will help users to perform CUR-based analysis on large-scale data, such as those obtained from different high-throughput technologies, in an interactive and exploratory manner. We show two examples that illustrate how CUR-based techniques make it possible to reduce significantly the number of probes, while at the same time maintaining major trends in data and keeping the same classification accuracy. CONCLUSIONS: The package rCUR provides functions for the users to perform CUR-based matrix decompositions in the R environment. In gene expression studies, it gives an additional way of analysis of differential expression and discriminant gene selection based on the use of statistical leverage scores. These scores, which have been used historically in diagnostic regression analysis to identify outliers, can be used by rCUR to identify the most informative data points with respect to which to express the remaining data points. BioMed Central 2012-05-17 /pmc/articles/PMC3546429/ /pubmed/22594948 http://dx.doi.org/10.1186/1471-2105-13-103 Text en Copyright ©2012 Bodor et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Bodor, András
Csabai, István
Mahoney, Michael W
Solymosi, Norbert
rCUR: an R package for CUR matrix decomposition
title rCUR: an R package for CUR matrix decomposition
title_full rCUR: an R package for CUR matrix decomposition
title_fullStr rCUR: an R package for CUR matrix decomposition
title_full_unstemmed rCUR: an R package for CUR matrix decomposition
title_short rCUR: an R package for CUR matrix decomposition
title_sort rcur: an r package for cur matrix decomposition
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3546429/
https://www.ncbi.nlm.nih.gov/pubmed/22594948
http://dx.doi.org/10.1186/1471-2105-13-103
work_keys_str_mv AT bodorandras rcuranrpackageforcurmatrixdecomposition
AT csabaiistvan rcuranrpackageforcurmatrixdecomposition
AT mahoneymichaelw rcuranrpackageforcurmatrixdecomposition
AT solymosinorbert rcuranrpackageforcurmatrixdecomposition