Cargando…

goSTAG: gene ontology subtrees to tag and annotate genes within a set

BACKGROUND: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing e...

Descripción completa

Detalles Bibliográficos
Autores principales: Bennett, Brian D., Bushel, Pierre R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5390446/
https://www.ncbi.nlm.nih.gov/pubmed/28413437
http://dx.doi.org/10.1186/s13029-017-0066-1
_version_ 1782521462535487488
author Bennett, Brian D.
Bushel, Pierre R.
author_facet Bennett, Brian D.
Bushel, Pierre R.
author_sort Bennett, Brian D.
collection PubMed
description BACKGROUND: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. RESULTS: We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. CONCLUSIONS: goSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-017-0066-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5390446
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53904462017-04-14 goSTAG: gene ontology subtrees to tag and annotate genes within a set Bennett, Brian D. Bushel, Pierre R. Source Code Biol Med Software BACKGROUND: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories. RESULTS: We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control. CONCLUSIONS: goSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13029-017-0066-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-13 /pmc/articles/PMC5390446/ /pubmed/28413437 http://dx.doi.org/10.1186/s13029-017-0066-1 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Bennett, Brian D.
Bushel, Pierre R.
goSTAG: gene ontology subtrees to tag and annotate genes within a set
title goSTAG: gene ontology subtrees to tag and annotate genes within a set
title_full goSTAG: gene ontology subtrees to tag and annotate genes within a set
title_fullStr goSTAG: gene ontology subtrees to tag and annotate genes within a set
title_full_unstemmed goSTAG: gene ontology subtrees to tag and annotate genes within a set
title_short goSTAG: gene ontology subtrees to tag and annotate genes within a set
title_sort gostag: gene ontology subtrees to tag and annotate genes within a set
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5390446/
https://www.ncbi.nlm.nih.gov/pubmed/28413437
http://dx.doi.org/10.1186/s13029-017-0066-1
work_keys_str_mv AT bennettbriand gostaggeneontologysubtreestotagandannotategeneswithinaset
AT bushelpierrer gostaggeneontologysubtreestotagandannotategeneswithinaset