Cargando…

Automatic, context-specific generation of Gene Ontology slims

BACKGROUND: The use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO) is widely used to capture detailed expert knowledge in genomic-scale datasets and...

Descripción completa

Detalles Bibliográficos
Autores principales: Davis, Melissa J, Sehgal, Muhammad Shoaib B, Ragan, Mark A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098080/
https://www.ncbi.nlm.nih.gov/pubmed/20929524
http://dx.doi.org/10.1186/1471-2105-11-498
_version_ 1782203912819834880
author Davis, Melissa J
Sehgal, Muhammad Shoaib B
Ragan, Mark A
author_facet Davis, Melissa J
Sehgal, Muhammad Shoaib B
Ragan, Mark A
author_sort Davis, Melissa J
collection PubMed
description BACKGROUND: The use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO) is widely used to capture detailed expert knowledge in genomic-scale datasets and as a consequence has grown to contain many terms, making it unwieldy for many applications. To increase its ease of manipulation and efficiency of use, subsets called GO slims are often created by collapsing terms upward into more general, high-level terms relevant to a particular context. Creation of a GO slim currently requires manipulation and editing of GO by an expert (or community) familiar with both the ontology and the biological context. Decisions about which terms to include are necessarily subjective, and the creation process itself and subsequent curation are time-consuming and largely manual. RESULTS: Here we present an objective framework for generating customised ontology slims for specific annotated datasets, exploiting information latent in the structure of the ontology graph and in the annotation data. This framework combines ontology engineering approaches, and a data-driven algorithm that draws on graph and information theory. We illustrate this method by application to GO, generating GO slims at different information thresholds, characterising their depth of semantics and demonstrating the resulting gains in statistical power. CONCLUSIONS: Our GO slim creation pipeline is available for use in conjunction with any GO-annotated dataset, and creates dataset-specific, objectively defined slims. This method is fast and scalable for application to other biomedical ontologies.
format Text
id pubmed-3098080
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30980802011-07-08 Automatic, context-specific generation of Gene Ontology slims Davis, Melissa J Sehgal, Muhammad Shoaib B Ragan, Mark A BMC Bioinformatics Methodology Article BACKGROUND: The use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO) is widely used to capture detailed expert knowledge in genomic-scale datasets and as a consequence has grown to contain many terms, making it unwieldy for many applications. To increase its ease of manipulation and efficiency of use, subsets called GO slims are often created by collapsing terms upward into more general, high-level terms relevant to a particular context. Creation of a GO slim currently requires manipulation and editing of GO by an expert (or community) familiar with both the ontology and the biological context. Decisions about which terms to include are necessarily subjective, and the creation process itself and subsequent curation are time-consuming and largely manual. RESULTS: Here we present an objective framework for generating customised ontology slims for specific annotated datasets, exploiting information latent in the structure of the ontology graph and in the annotation data. This framework combines ontology engineering approaches, and a data-driven algorithm that draws on graph and information theory. We illustrate this method by application to GO, generating GO slims at different information thresholds, characterising their depth of semantics and demonstrating the resulting gains in statistical power. CONCLUSIONS: Our GO slim creation pipeline is available for use in conjunction with any GO-annotated dataset, and creates dataset-specific, objectively defined slims. This method is fast and scalable for application to other biomedical ontologies. BioMed Central 2010-10-07 /pmc/articles/PMC3098080/ /pubmed/20929524 http://dx.doi.org/10.1186/1471-2105-11-498 Text en Copyright ©2010 Davis et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Davis, Melissa J
Sehgal, Muhammad Shoaib B
Ragan, Mark A
Automatic, context-specific generation of Gene Ontology slims
title Automatic, context-specific generation of Gene Ontology slims
title_full Automatic, context-specific generation of Gene Ontology slims
title_fullStr Automatic, context-specific generation of Gene Ontology slims
title_full_unstemmed Automatic, context-specific generation of Gene Ontology slims
title_short Automatic, context-specific generation of Gene Ontology slims
title_sort automatic, context-specific generation of gene ontology slims
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098080/
https://www.ncbi.nlm.nih.gov/pubmed/20929524
http://dx.doi.org/10.1186/1471-2105-11-498
work_keys_str_mv AT davismelissaj automaticcontextspecificgenerationofgeneontologyslims
AT sehgalmuhammadshoaibb automaticcontextspecificgenerationofgeneontologyslims
AT raganmarka automaticcontextspecificgenerationofgeneontologyslims