Cargando…

Sparse canonical methods for biological data integration: application to a cross-platform study

BACKGROUND: In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metab...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lê Cao, Kim-Anh, Martin, Pascal GP, Robert-Granié, Christèle, Besse, Philippe
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2640358/ https://www.ncbi.nlm.nih.gov/pubmed/19171069 http://dx.doi.org/10.1186/1471-2105-10-34

_version_	1782164562129190912
author	Lê Cao, Kim-Anh Martin, Pascal GP Robert-Granié, Christèle Besse, Philippe
author_facet	Lê Cao, Kim-Anh Martin, Pascal GP Robert-Granié, Christèle Besse, Philippe
author_sort	Lê Cao, Kim-Anh
collection	PubMed
description	BACKGROUND: In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS) to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines. RESULTS: We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results. CONCLUSION: sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. They outperformed CIA that tended to select redundant information.
format	Text
id	pubmed-2640358
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26403582009-02-12 Sparse canonical methods for biological data integration: application to a cross-platform study Lê Cao, Kim-Anh Martin, Pascal GP Robert-Granié, Christèle Besse, Philippe BMC Bioinformatics Research Article BACKGROUND: In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS) to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines. RESULTS: We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results. CONCLUSION: sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. They outperformed CIA that tended to select redundant information. BioMed Central 2009-01-26 /pmc/articles/PMC2640358/ /pubmed/19171069 http://dx.doi.org/10.1186/1471-2105-10-34 Text en Copyright © 2009 Lê Cao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Lê Cao, Kim-Anh Martin, Pascal GP Robert-Granié, Christèle Besse, Philippe Sparse canonical methods for biological data integration: application to a cross-platform study
title	Sparse canonical methods for biological data integration: application to a cross-platform study
title_full	Sparse canonical methods for biological data integration: application to a cross-platform study
title_fullStr	Sparse canonical methods for biological data integration: application to a cross-platform study
title_full_unstemmed	Sparse canonical methods for biological data integration: application to a cross-platform study
title_short	Sparse canonical methods for biological data integration: application to a cross-platform study
title_sort	sparse canonical methods for biological data integration: application to a cross-platform study
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2640358/ https://www.ncbi.nlm.nih.gov/pubmed/19171069 http://dx.doi.org/10.1186/1471-2105-10-34
work_keys_str_mv	AT lecaokimanh sparsecanonicalmethodsforbiologicaldataintegrationapplicationtoacrossplatformstudy AT martinpascalgp sparsecanonicalmethodsforbiologicaldataintegrationapplicationtoacrossplatformstudy AT robertgraniechristele sparsecanonicalmethodsforbiologicaldataintegrationapplicationtoacrossplatformstudy AT bessephilippe sparsecanonicalmethodsforbiologicaldataintegrationapplicationtoacrossplatformstudy

Sparse canonical methods for biological data integration: application to a cross-platform study

Ejemplares similares