Cargando…

Integrating functional genomics data using maximum likelihood based simultaneous component analysis

BACKGROUND: In contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of...

Descripción completa

Detalles Bibliográficos
Autores principales:	van den Berg, Robert A, Van Mechelen, Iven, Wilderjans, Tom F, Van Deun, Katrijn, Kiers, Henk AL, Smilde, Age K
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2771021/ https://www.ncbi.nlm.nih.gov/pubmed/19835617 http://dx.doi.org/10.1186/1471-2105-10-340

_version_	1782173719391633408
author	van den Berg, Robert A Van Mechelen, Iven Wilderjans, Tom F Van Deun, Katrijn Kiers, Henk AL Smilde, Age K
author_facet	van den Berg, Robert A Van Mechelen, Iven Wilderjans, Tom F Van Deun, Katrijn Kiers, Henk AL Smilde, Age K
author_sort	van den Berg, Robert A
collection	PubMed
description	BACKGROUND: In contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set. RESULTS: In the simulation study, MxLSCA-P outperforms SCA-P in terms of recovery of the true underlying scores of the common mode and of the true values underlying the data entries. MxLSCA-P further performed especially better when the simulated data blocks were subject to different noise levels. In the analysis of an E. coli metabolomics data set, MxLSCA-P provided a slightly better and more consistent interpretation. CONCLUSION: MxLSCA-P is a promising addition to the SCA family. The analysis of coupled functional genomics data blocks could benefit from its ability to take different noise levels per data block into consideration and improve the recovery of the true patterns underlying the data. Moreover, the maximum likelihood based approach underlying MxLSCA-P could be extended to custom-made solutions to specific problems encountered.
format	Text
id	pubmed-2771021
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27710212009-10-31 Integrating functional genomics data using maximum likelihood based simultaneous component analysis van den Berg, Robert A Van Mechelen, Iven Wilderjans, Tom F Van Deun, Katrijn Kiers, Henk AL Smilde, Age K BMC Bioinformatics Research Article BACKGROUND: In contemporary biology, complex biological processes are increasingly studied by collecting and analyzing measurements of the same entities that are collected with different analytical platforms. Such data comprise a number of data blocks that are coupled via a common mode. The goal of collecting this type of data is to discover biological mechanisms that underlie the behavior of the variables in the different data blocks. The simultaneous component analysis (SCA) family of data analysis methods is suited for this task. However, a SCA may be hampered by the data blocks being subjected to different amounts of measurement error, or noise. To unveil the true mechanisms underlying the data, it could be fruitful to take noise heterogeneity into consideration in the data analysis. Maximum likelihood based SCA (MxLSCA-P) was developed for this purpose. In a previous simulation study it outperformed normal SCA-P. This previous study, however, did not mimic in many respects typical functional genomics data sets, such as, data blocks coupled via the experimental mode, more variables than experimental units, and medium to high correlations between variables. Here, we present a new simulation study in which the usefulness of MxLSCA-P compared to ordinary SCA-P is evaluated within a typical functional genomics setting. Subsequently, the performance of the two methods is evaluated by analysis of a real life Escherichia coli metabolomics data set. RESULTS: In the simulation study, MxLSCA-P outperforms SCA-P in terms of recovery of the true underlying scores of the common mode and of the true values underlying the data entries. MxLSCA-P further performed especially better when the simulated data blocks were subject to different noise levels. In the analysis of an E. coli metabolomics data set, MxLSCA-P provided a slightly better and more consistent interpretation. CONCLUSION: MxLSCA-P is a promising addition to the SCA family. The analysis of coupled functional genomics data blocks could benefit from its ability to take different noise levels per data block into consideration and improve the recovery of the true patterns underlying the data. Moreover, the maximum likelihood based approach underlying MxLSCA-P could be extended to custom-made solutions to specific problems encountered. BioMed Central 2009-10-16 /pmc/articles/PMC2771021/ /pubmed/19835617 http://dx.doi.org/10.1186/1471-2105-10-340 Text en Copyright © 2009 Berg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article van den Berg, Robert A Van Mechelen, Iven Wilderjans, Tom F Van Deun, Katrijn Kiers, Henk AL Smilde, Age K Integrating functional genomics data using maximum likelihood based simultaneous component analysis
title	Integrating functional genomics data using maximum likelihood based simultaneous component analysis
title_full	Integrating functional genomics data using maximum likelihood based simultaneous component analysis
title_fullStr	Integrating functional genomics data using maximum likelihood based simultaneous component analysis
title_full_unstemmed	Integrating functional genomics data using maximum likelihood based simultaneous component analysis
title_short	Integrating functional genomics data using maximum likelihood based simultaneous component analysis
title_sort	integrating functional genomics data using maximum likelihood based simultaneous component analysis
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2771021/ https://www.ncbi.nlm.nih.gov/pubmed/19835617 http://dx.doi.org/10.1186/1471-2105-10-340
work_keys_str_mv	AT vandenbergroberta integratingfunctionalgenomicsdatausingmaximumlikelihoodbasedsimultaneouscomponentanalysis AT vanmecheleniven integratingfunctionalgenomicsdatausingmaximumlikelihoodbasedsimultaneouscomponentanalysis AT wilderjanstomf integratingfunctionalgenomicsdatausingmaximumlikelihoodbasedsimultaneouscomponentanalysis AT vandeunkatrijn integratingfunctionalgenomicsdatausingmaximumlikelihoodbasedsimultaneouscomponentanalysis AT kiershenkal integratingfunctionalgenomicsdatausingmaximumlikelihoodbasedsimultaneouscomponentanalysis AT smildeagek integratingfunctionalgenomicsdatausingmaximumlikelihoodbasedsimultaneouscomponentanalysis

Integrating functional genomics data using maximum likelihood based simultaneous component analysis

Ejemplares similares