Cargando…

Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis

BACKGROUND: Principal component analysis (PCA) has been widely used to visualize high-dimensional metabolomic data in a two- or three-dimensional subspace. In metabolomics, some metabolites (e.g., the top 10 metabolites) have been subjectively selected when using factor loading in PCA, and biologica...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yamamoto, Hiroyuki, Fujimori, Tamaki, Sato, Hajime, Ishikawa, Gen, Kami, Kenjiro, Ohashi, Yoshiaki
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015128/ https://www.ncbi.nlm.nih.gov/pubmed/24555693 http://dx.doi.org/10.1186/1471-2105-15-51

_version_	1782315285581135872
author	Yamamoto, Hiroyuki Fujimori, Tamaki Sato, Hajime Ishikawa, Gen Kami, Kenjiro Ohashi, Yoshiaki
author_facet	Yamamoto, Hiroyuki Fujimori, Tamaki Sato, Hajime Ishikawa, Gen Kami, Kenjiro Ohashi, Yoshiaki
author_sort	Yamamoto, Hiroyuki
collection	PubMed
description	BACKGROUND: Principal component analysis (PCA) has been widely used to visualize high-dimensional metabolomic data in a two- or three-dimensional subspace. In metabolomics, some metabolites (e.g., the top 10 metabolites) have been subjectively selected when using factor loading in PCA, and biological inferences are made for these metabolites. However, this approach may lead to biased biological inferences because these metabolites are not objectively selected with statistical criteria. RESULTS: We propose a statistical procedure that selects metabolites with statistical hypothesis testing of the factor loading in PCA and makes biological inferences about these significant metabolites with a metabolite set enrichment analysis (MSEA). This procedure depends on the fact that the eigenvector in PCA for autoscaled data is proportional to the correlation coefficient between the PC score and each metabolite level. We applied this approach to two sets of metabolomic data from mouse liver samples: 136 of 282 metabolites in the first case study and 66 of 275 metabolites in the second case study were statistically significant. This result suggests that to set the number of metabolites before the analysis is inappropriate because the number of significant metabolites differs in each study when factor loading is used in PCA. Moreover, when an MSEA of these significant metabolites was performed, significant metabolic pathways were detected, which were acceptable in terms of previous biological knowledge. CONCLUSIONS: It is essential to select metabolites statistically to make unbiased biological inferences from metabolomic data when using factor loading in PCA. We propose a statistical procedure to select metabolites with statistical hypothesis testing of the factor loading in PCA, and to draw biological inferences about these significant metabolites with MSEA. We have developed an R package “mseapca” to facilitate this approach. The “mseapca” package is publicly available at the CRAN website.
format	Online Article Text
id	pubmed-4015128
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40151282014-05-23 Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis Yamamoto, Hiroyuki Fujimori, Tamaki Sato, Hajime Ishikawa, Gen Kami, Kenjiro Ohashi, Yoshiaki BMC Bioinformatics Methodology Article BACKGROUND: Principal component analysis (PCA) has been widely used to visualize high-dimensional metabolomic data in a two- or three-dimensional subspace. In metabolomics, some metabolites (e.g., the top 10 metabolites) have been subjectively selected when using factor loading in PCA, and biological inferences are made for these metabolites. However, this approach may lead to biased biological inferences because these metabolites are not objectively selected with statistical criteria. RESULTS: We propose a statistical procedure that selects metabolites with statistical hypothesis testing of the factor loading in PCA and makes biological inferences about these significant metabolites with a metabolite set enrichment analysis (MSEA). This procedure depends on the fact that the eigenvector in PCA for autoscaled data is proportional to the correlation coefficient between the PC score and each metabolite level. We applied this approach to two sets of metabolomic data from mouse liver samples: 136 of 282 metabolites in the first case study and 66 of 275 metabolites in the second case study were statistically significant. This result suggests that to set the number of metabolites before the analysis is inappropriate because the number of significant metabolites differs in each study when factor loading is used in PCA. Moreover, when an MSEA of these significant metabolites was performed, significant metabolic pathways were detected, which were acceptable in terms of previous biological knowledge. CONCLUSIONS: It is essential to select metabolites statistically to make unbiased biological inferences from metabolomic data when using factor loading in PCA. We propose a statistical procedure to select metabolites with statistical hypothesis testing of the factor loading in PCA, and to draw biological inferences about these significant metabolites with MSEA. We have developed an R package “mseapca” to facilitate this approach. The “mseapca” package is publicly available at the CRAN website. BioMed Central 2014-02-21 /pmc/articles/PMC4015128/ /pubmed/24555693 http://dx.doi.org/10.1186/1471-2105-15-51 Text en Copyright © 2014 Yamamoto et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Methodology Article Yamamoto, Hiroyuki Fujimori, Tamaki Sato, Hajime Ishikawa, Gen Kami, Kenjiro Ohashi, Yoshiaki Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis
title	Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis
title_full	Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis
title_fullStr	Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis
title_full_unstemmed	Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis
title_short	Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis
title_sort	statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015128/ https://www.ncbi.nlm.nih.gov/pubmed/24555693 http://dx.doi.org/10.1186/1471-2105-15-51
work_keys_str_mv	AT yamamotohiroyuki statisticalhypothesistestingoffactorloadinginprincipalcomponentanalysisanditsapplicationtometabolitesetenrichmentanalysis AT fujimoritamaki statisticalhypothesistestingoffactorloadinginprincipalcomponentanalysisanditsapplicationtometabolitesetenrichmentanalysis AT satohajime statisticalhypothesistestingoffactorloadinginprincipalcomponentanalysisanditsapplicationtometabolitesetenrichmentanalysis AT ishikawagen statisticalhypothesistestingoffactorloadinginprincipalcomponentanalysisanditsapplicationtometabolitesetenrichmentanalysis AT kamikenjiro statisticalhypothesistestingoffactorloadinginprincipalcomponentanalysisanditsapplicationtometabolitesetenrichmentanalysis AT ohashiyoshiaki statisticalhypothesistestingoffactorloadinginprincipalcomponentanalysisanditsapplicationtometabolitesetenrichmentanalysis

Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis

Ejemplares similares