Cargando…
rCUR: an R package for CUR matrix decomposition
BACKGROUND: Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite u...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3546429/ https://www.ncbi.nlm.nih.gov/pubmed/22594948 http://dx.doi.org/10.1186/1471-2105-13-103 |
_version_ | 1782256053990195200 |
---|---|
author | Bodor, András Csabai, István Mahoney, Michael W Solymosi, Norbert |
author_facet | Bodor, András Csabai, István Mahoney, Michael W Solymosi, Norbert |
author_sort | Bodor, András |
collection | PubMed |
description | BACKGROUND: Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite useful as a tool to summarize the data, they are linear combinations of up to all of the data points, and thus it is typically quite hard to interpret those vectors in terms of the application domain from which the data are drawn. Recently, an alternative dimensionality reduction paradigm, CUR matrix decompositions, has been proposed to address this problem and has been applied to genetic and internet data. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Since they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn. RESULTS: We present an implementation to perform CUR matrix decompositions, in the form of a freely available, open source R-package called rCUR. This package will help users to perform CUR-based analysis on large-scale data, such as those obtained from different high-throughput technologies, in an interactive and exploratory manner. We show two examples that illustrate how CUR-based techniques make it possible to reduce significantly the number of probes, while at the same time maintaining major trends in data and keeping the same classification accuracy. CONCLUSIONS: The package rCUR provides functions for the users to perform CUR-based matrix decompositions in the R environment. In gene expression studies, it gives an additional way of analysis of differential expression and discriminant gene selection based on the use of statistical leverage scores. These scores, which have been used historically in diagnostic regression analysis to identify outliers, can be used by rCUR to identify the most informative data points with respect to which to express the remaining data points. |
format | Online Article Text |
id | pubmed-3546429 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35464292013-01-17 rCUR: an R package for CUR matrix decomposition Bodor, András Csabai, István Mahoney, Michael W Solymosi, Norbert BMC Bioinformatics Software BACKGROUND: Many methods for dimensionality reduction of large data sets such as those generated in microarray studies boil down to the Singular Value Decomposition (SVD). Although singular vectors associated with the largest singular values have strong optimality properties and can often be quite useful as a tool to summarize the data, they are linear combinations of up to all of the data points, and thus it is typically quite hard to interpret those vectors in terms of the application domain from which the data are drawn. Recently, an alternative dimensionality reduction paradigm, CUR matrix decompositions, has been proposed to address this problem and has been applied to genetic and internet data. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Since they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn. RESULTS: We present an implementation to perform CUR matrix decompositions, in the form of a freely available, open source R-package called rCUR. This package will help users to perform CUR-based analysis on large-scale data, such as those obtained from different high-throughput technologies, in an interactive and exploratory manner. We show two examples that illustrate how CUR-based techniques make it possible to reduce significantly the number of probes, while at the same time maintaining major trends in data and keeping the same classification accuracy. CONCLUSIONS: The package rCUR provides functions for the users to perform CUR-based matrix decompositions in the R environment. In gene expression studies, it gives an additional way of analysis of differential expression and discriminant gene selection based on the use of statistical leverage scores. These scores, which have been used historically in diagnostic regression analysis to identify outliers, can be used by rCUR to identify the most informative data points with respect to which to express the remaining data points. BioMed Central 2012-05-17 /pmc/articles/PMC3546429/ /pubmed/22594948 http://dx.doi.org/10.1186/1471-2105-13-103 Text en Copyright ©2012 Bodor et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Bodor, András Csabai, István Mahoney, Michael W Solymosi, Norbert rCUR: an R package for CUR matrix decomposition |
title | rCUR: an R package for CUR matrix decomposition |
title_full | rCUR: an R package for CUR matrix decomposition |
title_fullStr | rCUR: an R package for CUR matrix decomposition |
title_full_unstemmed | rCUR: an R package for CUR matrix decomposition |
title_short | rCUR: an R package for CUR matrix decomposition |
title_sort | rcur: an r package for cur matrix decomposition |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3546429/ https://www.ncbi.nlm.nih.gov/pubmed/22594948 http://dx.doi.org/10.1186/1471-2105-13-103 |
work_keys_str_mv | AT bodorandras rcuranrpackageforcurmatrixdecomposition AT csabaiistvan rcuranrpackageforcurmatrixdecomposition AT mahoneymichaelw rcuranrpackageforcurmatrixdecomposition AT solymosinorbert rcuranrpackageforcurmatrixdecomposition |