Cargando…

The projection score - an evaluation criterion for variable subset selection in PCA visualization

BACKGROUND: In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component An...

Descripción completa

Detalles Bibliográficos
Autores principales: Fontes, Magnus, Soneson, Charlotte
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3167802/
https://www.ncbi.nlm.nih.gov/pubmed/21798031
http://dx.doi.org/10.1186/1471-2105-12-307
_version_ 1782211289085378560
author Fontes, Magnus
Soneson, Charlotte
author_facet Fontes, Magnus
Soneson, Charlotte
author_sort Fontes, Magnus
collection PubMed
description BACKGROUND: In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization. RESULTS: We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA. CONCLUSIONS: We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.
format Online
Article
Text
id pubmed-3167802
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31678022011-09-07 The projection score - an evaluation criterion for variable subset selection in PCA visualization Fontes, Magnus Soneson, Charlotte BMC Bioinformatics Research Article BACKGROUND: In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization. RESULTS: We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA. CONCLUSIONS: We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis. BioMed Central 2011-07-28 /pmc/articles/PMC3167802/ /pubmed/21798031 http://dx.doi.org/10.1186/1471-2105-12-307 Text en Copyright ©2011 Fontes and Soneson; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Fontes, Magnus
Soneson, Charlotte
The projection score - an evaluation criterion for variable subset selection in PCA visualization
title The projection score - an evaluation criterion for variable subset selection in PCA visualization
title_full The projection score - an evaluation criterion for variable subset selection in PCA visualization
title_fullStr The projection score - an evaluation criterion for variable subset selection in PCA visualization
title_full_unstemmed The projection score - an evaluation criterion for variable subset selection in PCA visualization
title_short The projection score - an evaluation criterion for variable subset selection in PCA visualization
title_sort projection score - an evaluation criterion for variable subset selection in pca visualization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3167802/
https://www.ncbi.nlm.nih.gov/pubmed/21798031
http://dx.doi.org/10.1186/1471-2105-12-307
work_keys_str_mv AT fontesmagnus theprojectionscoreanevaluationcriterionforvariablesubsetselectioninpcavisualization
AT sonesoncharlotte theprojectionscoreanevaluationcriterionforvariablesubsetselectioninpcavisualization
AT fontesmagnus projectionscoreanevaluationcriterionforvariablesubsetselectioninpcavisualization
AT sonesoncharlotte projectionscoreanevaluationcriterionforvariablesubsetselectioninpcavisualization