Cargando…
Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets
BACKGROUND: A key question when analyzing high throughput data is whether the information provided by the measured biological entities (gene, metabolite expression for example) is related to the experimental conditions, or, rather, to some interfering signals, such as experimental bias or artefacts....
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3298499/ https://www.ncbi.nlm.nih.gov/pubmed/22305354 http://dx.doi.org/10.1186/1471-2105-13-24 |
_version_ | 1782226007328030720 |
---|---|
author | Yao, Fangzhou Coquery, Jeff Lê Cao, Kim-Anh |
author_facet | Yao, Fangzhou Coquery, Jeff Lê Cao, Kim-Anh |
author_sort | Yao, Fangzhou |
collection | PubMed |
description | BACKGROUND: A key question when analyzing high throughput data is whether the information provided by the measured biological entities (gene, metabolite expression for example) is related to the experimental conditions, or, rather, to some interfering signals, such as experimental bias or artefacts. Visualization tools are therefore useful to better understand the underlying structure of the data in a 'blind' (unsupervised) way. A well-established technique to do so is Principal Component Analysis (PCA). PCA is particularly powerful if the biological question is related to the highest variance. Independent Component Analysis (ICA) has been proposed as an alternative to PCA as it optimizes an independence condition to give more meaningful components. However, neither PCA nor ICA can overcome both the high dimensionality and noisy characteristics of biological data. RESULTS: We propose Independent Principal Component Analysis (IPCA) that combines the advantages of both PCA and ICA. It uses ICA as a denoising process of the loading vectors produced by PCA to better highlight the important biological entities and reveal insightful patterns in the data. The result is a better clustering of the biological samples on graphical representations. In addition, a sparse version is proposed that performs an internal variable selection to identify biologically relevant features (sIPCA). CONCLUSIONS: On simulation studies and real data sets, we showed that IPCA offers a better visualization of the data than ICA and with a smaller number of components than PCA. Furthermore, a preliminary investigation of the list of genes selected with sIPCA demonstrate that the approach is well able to highlight relevant genes in the data with respect to the biological experiment. IPCA and sIPCA are both implemented in the R package mixomics dedicated to the analysis and exploration of high dimensional biological data sets, and on mixomics' web-interface. |
format | Online Article Text |
id | pubmed-3298499 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32984992012-03-12 Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets Yao, Fangzhou Coquery, Jeff Lê Cao, Kim-Anh BMC Bioinformatics Research Article BACKGROUND: A key question when analyzing high throughput data is whether the information provided by the measured biological entities (gene, metabolite expression for example) is related to the experimental conditions, or, rather, to some interfering signals, such as experimental bias or artefacts. Visualization tools are therefore useful to better understand the underlying structure of the data in a 'blind' (unsupervised) way. A well-established technique to do so is Principal Component Analysis (PCA). PCA is particularly powerful if the biological question is related to the highest variance. Independent Component Analysis (ICA) has been proposed as an alternative to PCA as it optimizes an independence condition to give more meaningful components. However, neither PCA nor ICA can overcome both the high dimensionality and noisy characteristics of biological data. RESULTS: We propose Independent Principal Component Analysis (IPCA) that combines the advantages of both PCA and ICA. It uses ICA as a denoising process of the loading vectors produced by PCA to better highlight the important biological entities and reveal insightful patterns in the data. The result is a better clustering of the biological samples on graphical representations. In addition, a sparse version is proposed that performs an internal variable selection to identify biologically relevant features (sIPCA). CONCLUSIONS: On simulation studies and real data sets, we showed that IPCA offers a better visualization of the data than ICA and with a smaller number of components than PCA. Furthermore, a preliminary investigation of the list of genes selected with sIPCA demonstrate that the approach is well able to highlight relevant genes in the data with respect to the biological experiment. IPCA and sIPCA are both implemented in the R package mixomics dedicated to the analysis and exploration of high dimensional biological data sets, and on mixomics' web-interface. BioMed Central 2012-02-03 /pmc/articles/PMC3298499/ /pubmed/22305354 http://dx.doi.org/10.1186/1471-2105-13-24 Text en Copyright ©2012 Yao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Yao, Fangzhou Coquery, Jeff Lê Cao, Kim-Anh Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets |
title | Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets |
title_full | Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets |
title_fullStr | Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets |
title_full_unstemmed | Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets |
title_short | Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets |
title_sort | independent principal component analysis for biologically meaningful dimension reduction of large biological data sets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3298499/ https://www.ncbi.nlm.nih.gov/pubmed/22305354 http://dx.doi.org/10.1186/1471-2105-13-24 |
work_keys_str_mv | AT yaofangzhou independentprincipalcomponentanalysisforbiologicallymeaningfuldimensionreductionoflargebiologicaldatasets AT coqueryjeff independentprincipalcomponentanalysisforbiologicallymeaningfuldimensionreductionoflargebiologicaldatasets AT lecaokimanh independentprincipalcomponentanalysisforbiologicallymeaningfuldimensionreductionoflargebiologicaldatasets |