Cargando…

Probabilistic principal component analysis for metabolomic data

BACKGROUND: Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model. RESULTS: H...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nyamundanda, Gift, Brennan, Lorraine, Gormley, Isobel Claire
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3006395/ https://www.ncbi.nlm.nih.gov/pubmed/21092268 http://dx.doi.org/10.1186/1471-2105-11-571

_version_	1782194186825498624
author	Nyamundanda, Gift Brennan, Lorraine Gormley, Isobel Claire
author_facet	Nyamundanda, Gift Brennan, Lorraine Gormley, Isobel Claire
author_sort	Nyamundanda, Gift
collection	PubMed
description	BACKGROUND: Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model. RESULTS: Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data. CONCLUSIONS: The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field.
format	Text
id	pubmed-3006395
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30063952011-01-07 Probabilistic principal component analysis for metabolomic data Nyamundanda, Gift Brennan, Lorraine Gormley, Isobel Claire BMC Bioinformatics Methodology Article BACKGROUND: Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model. RESULTS: Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data. CONCLUSIONS: The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field. BioMed Central 2010-11-23 /pmc/articles/PMC3006395/ /pubmed/21092268 http://dx.doi.org/10.1186/1471-2105-11-571 Text en Copyright ©2010 Nyamundanda et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Nyamundanda, Gift Brennan, Lorraine Gormley, Isobel Claire Probabilistic principal component analysis for metabolomic data
title	Probabilistic principal component analysis for metabolomic data
title_full	Probabilistic principal component analysis for metabolomic data
title_fullStr	Probabilistic principal component analysis for metabolomic data
title_full_unstemmed	Probabilistic principal component analysis for metabolomic data
title_short	Probabilistic principal component analysis for metabolomic data
title_sort	probabilistic principal component analysis for metabolomic data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3006395/ https://www.ncbi.nlm.nih.gov/pubmed/21092268 http://dx.doi.org/10.1186/1471-2105-11-571
work_keys_str_mv	AT nyamundandagift probabilisticprincipalcomponentanalysisformetabolomicdata AT brennanlorraine probabilisticprincipalcomponentanalysisformetabolomicdata AT gormleyisobelclaire probabilisticprincipalcomponentanalysisformetabolomicdata

Probabilistic principal component analysis for metabolomic data

Ejemplares similares