Cargando…

PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms

BACKGROUND: Several tools have been developed to explore and search Gene Ontology (GO) databases allowing efficient GO enrichment analysis and GO tree visualization. Nevertheless, identification of highly specific GO-terms in complex data sets is relatively complicated and the display of GO term ass...

Descripción completa

Detalles Bibliográficos
Autores principales: Bruckskotten, Marc, Looso, Mario, Cemiĉ, Franz, Konzer, Anne, Hemberger, Jürgen, Krüger, Marcus, Braun, Thomas
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2910024/
https://www.ncbi.nlm.nih.gov/pubmed/20565932
http://dx.doi.org/10.1186/1471-2105-11-336
_version_ 1782184352240631808
author Bruckskotten, Marc
Looso, Mario
Cemiĉ, Franz
Konzer, Anne
Hemberger, Jürgen
Krüger, Marcus
Braun, Thomas
author_facet Bruckskotten, Marc
Looso, Mario
Cemiĉ, Franz
Konzer, Anne
Hemberger, Jürgen
Krüger, Marcus
Braun, Thomas
author_sort Bruckskotten, Marc
collection PubMed
description BACKGROUND: Several tools have been developed to explore and search Gene Ontology (GO) databases allowing efficient GO enrichment analysis and GO tree visualization. Nevertheless, identification of highly specific GO-terms in complex data sets is relatively complicated and the display of GO term assignments and GO enrichment analysis by simple tables or pie charts is not optimal. Valuable information such as the hierarchical position of a single GO term within the GO tree (topological ordering), or enrichment within a complex set of biological experiments is not displayed. Pie charts based on GO tree levels are, themselves, one-dimensional graphs, which cannot properly or efficiently represent the hierarchical specificity for the biological system being studied. RESULTS: Here we present a new method, which we name PCA2GO, capable of GO analysis using complex multidimensional experimental settings. We employed principal component analysis (PCA) and developed a new score, which takes into account the relative frequency of certain GO terms and their specificity (hierarchical position) within the GO graph. We evaluated the correlation between our representation score R and a standard measure of enrichment, namely p-values to convey the versatility of our approach to other methods and point out differences between our method and commonly used enrichment analyses. Although p values and the R score formally measure different quantities they should be correlated, because relative frequencies of GO terms occurrences within a dataset are an indirect measure of protein numbers related to this term. Therefore they are also related to enrichment. We showed that our score enables us to identify more specific GO-terms i.e. those positioned further down the GO-graph than other common tools used for this purpose. PCA2GO allows visualization and detection of multidimensional dependencies both within the acyclic graph (GO tree) and the experimental settings. Our method is intended for the analysis of several experimental sets, not for one set, like standard enrichment tools. To demonstrate the usefulness of our approach we performed a PCA2GO analysis of a fractionated cardiomyocyte protein dataset, which was identified by enhanced liquid chromatography-mass spectrometry (GeLC-MS). The analysis enabled us to detect distinct groups of proteins, which accurately reflect properties of biochemical cell fractions. CONCLUSIONS: We conclude that PCA2GO is an alternative efficient GO analysis tool with unique features for detection and visualization of multidimensional dependencies within the dataset under study. PCA2GO reveals strongly correlated GO terms within the experimental setting (in this case different fractions) by PCA group formation and improves detection of more specific GO terms within experiment dependent GO term groups than standard p value calculations.
format Text
id pubmed-2910024
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29100242010-07-27 PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms Bruckskotten, Marc Looso, Mario Cemiĉ, Franz Konzer, Anne Hemberger, Jürgen Krüger, Marcus Braun, Thomas BMC Bioinformatics Methodology Article BACKGROUND: Several tools have been developed to explore and search Gene Ontology (GO) databases allowing efficient GO enrichment analysis and GO tree visualization. Nevertheless, identification of highly specific GO-terms in complex data sets is relatively complicated and the display of GO term assignments and GO enrichment analysis by simple tables or pie charts is not optimal. Valuable information such as the hierarchical position of a single GO term within the GO tree (topological ordering), or enrichment within a complex set of biological experiments is not displayed. Pie charts based on GO tree levels are, themselves, one-dimensional graphs, which cannot properly or efficiently represent the hierarchical specificity for the biological system being studied. RESULTS: Here we present a new method, which we name PCA2GO, capable of GO analysis using complex multidimensional experimental settings. We employed principal component analysis (PCA) and developed a new score, which takes into account the relative frequency of certain GO terms and their specificity (hierarchical position) within the GO graph. We evaluated the correlation between our representation score R and a standard measure of enrichment, namely p-values to convey the versatility of our approach to other methods and point out differences between our method and commonly used enrichment analyses. Although p values and the R score formally measure different quantities they should be correlated, because relative frequencies of GO terms occurrences within a dataset are an indirect measure of protein numbers related to this term. Therefore they are also related to enrichment. We showed that our score enables us to identify more specific GO-terms i.e. those positioned further down the GO-graph than other common tools used for this purpose. PCA2GO allows visualization and detection of multidimensional dependencies both within the acyclic graph (GO tree) and the experimental settings. Our method is intended for the analysis of several experimental sets, not for one set, like standard enrichment tools. To demonstrate the usefulness of our approach we performed a PCA2GO analysis of a fractionated cardiomyocyte protein dataset, which was identified by enhanced liquid chromatography-mass spectrometry (GeLC-MS). The analysis enabled us to detect distinct groups of proteins, which accurately reflect properties of biochemical cell fractions. CONCLUSIONS: We conclude that PCA2GO is an alternative efficient GO analysis tool with unique features for detection and visualization of multidimensional dependencies within the dataset under study. PCA2GO reveals strongly correlated GO terms within the experimental setting (in this case different fractions) by PCA group formation and improves detection of more specific GO terms within experiment dependent GO term groups than standard p value calculations. BioMed Central 2010-06-21 /pmc/articles/PMC2910024/ /pubmed/20565932 http://dx.doi.org/10.1186/1471-2105-11-336 Text en Copyright ©2010 Bruckskotten et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Bruckskotten, Marc
Looso, Mario
Cemiĉ, Franz
Konzer, Anne
Hemberger, Jürgen
Krüger, Marcus
Braun, Thomas
PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms
title PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms
title_full PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms
title_fullStr PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms
title_full_unstemmed PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms
title_short PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms
title_sort pca2go: a new multivariate statistics based method to identify highly expressed go-terms
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2910024/
https://www.ncbi.nlm.nih.gov/pubmed/20565932
http://dx.doi.org/10.1186/1471-2105-11-336
work_keys_str_mv AT bruckskottenmarc pca2goanewmultivariatestatisticsbasedmethodtoidentifyhighlyexpressedgoterms
AT loosomario pca2goanewmultivariatestatisticsbasedmethodtoidentifyhighlyexpressedgoterms
AT cemicfranz pca2goanewmultivariatestatisticsbasedmethodtoidentifyhighlyexpressedgoterms
AT konzeranne pca2goanewmultivariatestatisticsbasedmethodtoidentifyhighlyexpressedgoterms
AT hembergerjurgen pca2goanewmultivariatestatisticsbasedmethodtoidentifyhighlyexpressedgoterms
AT krugermarcus pca2goanewmultivariatestatisticsbasedmethodtoidentifyhighlyexpressedgoterms
AT braunthomas pca2goanewmultivariatestatisticsbasedmethodtoidentifyhighlyexpressedgoterms