Cargando…
goSTAG: gene ontology subtrees to tag and annotate genes within a set
BACKGROUND: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing e...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5390446/ https://www.ncbi.nlm.nih.gov/pubmed/28413437 http://dx.doi.org/10.1186/s13029-017-0066-1 |
_version_ | 1782521462535487488 |
---|---|
author | Bennett, Brian D. Bushel, Pierre R. |
author_facet | Bennett, Brian D. Bushel, Pierre R. |
author_sort | Bennett, Brian D. |
collection | PubMed |
description | BACKGROUND: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. RESULTS: We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. CONCLUSIONS: goSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-017-0066-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5390446 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53904462017-04-14 goSTAG: gene ontology subtrees to tag and annotate genes within a set Bennett, Brian D. Bushel, Pierre R. Source Code Biol Med Software BACKGROUND: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. RESULTS: We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. CONCLUSIONS: goSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-017-0066-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-13 /pmc/articles/PMC5390446/ /pubmed/28413437 http://dx.doi.org/10.1186/s13029-017-0066-1 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Bennett, Brian D. Bushel, Pierre R. goSTAG: gene ontology subtrees to tag and annotate genes within a set |
title | goSTAG: gene ontology subtrees to tag and annotate genes within a set |
title_full | goSTAG: gene ontology subtrees to tag and annotate genes within a set |
title_fullStr | goSTAG: gene ontology subtrees to tag and annotate genes within a set |
title_full_unstemmed | goSTAG: gene ontology subtrees to tag and annotate genes within a set |
title_short | goSTAG: gene ontology subtrees to tag and annotate genes within a set |
title_sort | gostag: gene ontology subtrees to tag and annotate genes within a set |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5390446/ https://www.ncbi.nlm.nih.gov/pubmed/28413437 http://dx.doi.org/10.1186/s13029-017-0066-1 |
work_keys_str_mv | AT bennettbriand gostaggeneontologysubtreestotagandannotategeneswithinaset AT bushelpierrer gostaggeneontologysubtreestotagandannotategeneswithinaset |