Cargando…
Using OWL reasoning to support the generation of novel gene sets for enrichment analysis
BACKGROUND: The Gene Ontology (GO) consists of over 40,000 terms for biological processes, cell components and gene product activities linked into a graph structure by over 90,000 relationships. It has been used to annotate the functions and cellular locations of several million gene products. The g...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813370/ https://www.ncbi.nlm.nih.gov/pubmed/29444698 http://dx.doi.org/10.1186/s13326-018-0175-z |
_version_ | 1783300180857061376 |
---|---|
author | Osumi-Sutherland, David J. Ponta, Enrico Courtot, Melanie Parkinson, Helen Badi, Laura |
author_facet | Osumi-Sutherland, David J. Ponta, Enrico Courtot, Melanie Parkinson, Helen Badi, Laura |
author_sort | Osumi-Sutherland, David J. |
collection | PubMed |
description | BACKGROUND: The Gene Ontology (GO) consists of over 40,000 terms for biological processes, cell components and gene product activities linked into a graph structure by over 90,000 relationships. It has been used to annotate the functions and cellular locations of several million gene products. The graph structure is used by a variety of tools to group annotated genes into sets whose products share function or location. These gene sets are widely used to interpret the results of genomics experiments by assessing which sets are significantly over- or under-represented in results lists. F Hoffmann-La Roche Ltd. has developed a bespoke, manually maintained controlled vocabulary (RCV) for use in over-representation analysis. Many terms in this vocabulary group GO terms in novel ways that cannot easily be derived using the graph structure of the GO. For example, some RCV terms group GO terms by the cell, chemical or tissue type they refer to. Recent improvements in the content and formal structure of the GO make it possible to use logical queries in Web Ontology Language (OWL) to automatically map these cross-cutting classifications to sets of GO terms. We used this approach to automate mapping between RCV and GO, largely replacing the increasingly unsustainable manual mapping process. We then tested the utility of the resulting groupings for over-representation analysis. RESULTS: We successfully mapped 85% of RCV terms to logical OWL definitions and showed that these could be used to recapitulate and extend manual mappings between RCV terms and the sets of GO terms subsumed by them. We also show that gene sets derived from the resulting GO terms sets can be used to detect the signatures of cell and tissue types in whole genome expression data. CONCLUSIONS: The rich formal structure of the GO makes it possible to use reasoning to dynamically generate novel, biologically relevant groupings of GO terms. GO term groupings generated with this approach can be used in. over-representation analysis to detect cell and tissue type signatures in whole genome expression data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13326-018-0175-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5813370 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-58133702018-02-16 Using OWL reasoning to support the generation of novel gene sets for enrichment analysis Osumi-Sutherland, David J. Ponta, Enrico Courtot, Melanie Parkinson, Helen Badi, Laura J Biomed Semantics Research BACKGROUND: The Gene Ontology (GO) consists of over 40,000 terms for biological processes, cell components and gene product activities linked into a graph structure by over 90,000 relationships. It has been used to annotate the functions and cellular locations of several million gene products. The graph structure is used by a variety of tools to group annotated genes into sets whose products share function or location. These gene sets are widely used to interpret the results of genomics experiments by assessing which sets are significantly over- or under-represented in results lists. F Hoffmann-La Roche Ltd. has developed a bespoke, manually maintained controlled vocabulary (RCV) for use in over-representation analysis. Many terms in this vocabulary group GO terms in novel ways that cannot easily be derived using the graph structure of the GO. For example, some RCV terms group GO terms by the cell, chemical or tissue type they refer to. Recent improvements in the content and formal structure of the GO make it possible to use logical queries in Web Ontology Language (OWL) to automatically map these cross-cutting classifications to sets of GO terms. We used this approach to automate mapping between RCV and GO, largely replacing the increasingly unsustainable manual mapping process. We then tested the utility of the resulting groupings for over-representation analysis. RESULTS: We successfully mapped 85% of RCV terms to logical OWL definitions and showed that these could be used to recapitulate and extend manual mappings between RCV terms and the sets of GO terms subsumed by them. We also show that gene sets derived from the resulting GO terms sets can be used to detect the signatures of cell and tissue types in whole genome expression data. CONCLUSIONS: The rich formal structure of the GO makes it possible to use reasoning to dynamically generate novel, biologically relevant groupings of GO terms. GO term groupings generated with this approach can be used in. over-representation analysis to detect cell and tissue type signatures in whole genome expression data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13326-018-0175-z) contains supplementary material, which is available to authorized users. BioMed Central 2018-02-14 /pmc/articles/PMC5813370/ /pubmed/29444698 http://dx.doi.org/10.1186/s13326-018-0175-z Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Osumi-Sutherland, David J. Ponta, Enrico Courtot, Melanie Parkinson, Helen Badi, Laura Using OWL reasoning to support the generation of novel gene sets for enrichment analysis |
title | Using OWL reasoning to support the generation of novel gene sets for enrichment analysis |
title_full | Using OWL reasoning to support the generation of novel gene sets for enrichment analysis |
title_fullStr | Using OWL reasoning to support the generation of novel gene sets for enrichment analysis |
title_full_unstemmed | Using OWL reasoning to support the generation of novel gene sets for enrichment analysis |
title_short | Using OWL reasoning to support the generation of novel gene sets for enrichment analysis |
title_sort | using owl reasoning to support the generation of novel gene sets for enrichment analysis |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813370/ https://www.ncbi.nlm.nih.gov/pubmed/29444698 http://dx.doi.org/10.1186/s13326-018-0175-z |
work_keys_str_mv | AT osumisutherlanddavidj usingowlreasoningtosupportthegenerationofnovelgenesetsforenrichmentanalysis AT pontaenrico usingowlreasoningtosupportthegenerationofnovelgenesetsforenrichmentanalysis AT courtotmelanie usingowlreasoningtosupportthegenerationofnovelgenesetsforenrichmentanalysis AT parkinsonhelen usingowlreasoningtosupportthegenerationofnovelgenesetsforenrichmentanalysis AT badilaura usingowlreasoningtosupportthegenerationofnovelgenesetsforenrichmentanalysis |