Cargando…

GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts

Gene Ontology is used extensively in scientific knowledgebases and repositories to organize a wealth of biological information. However, interpreting annotations derived from differential gene lists is often difficult without manually sorting into higher-order categories. To address these issues, we...

Descripción completa

Detalles Bibliográficos
Autores principales: Hinderer, Eugene W., Moseley, Hunter N. B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289357/
https://www.ncbi.nlm.nih.gov/pubmed/32525872
http://dx.doi.org/10.1371/journal.pone.0233311
_version_ 1783545445409095680
author Hinderer, Eugene W.
Moseley, Hunter N. B.
author_facet Hinderer, Eugene W.
Moseley, Hunter N. B.
author_sort Hinderer, Eugene W.
collection PubMed
description Gene Ontology is used extensively in scientific knowledgebases and repositories to organize a wealth of biological information. However, interpreting annotations derived from differential gene lists is often difficult without manually sorting into higher-order categories. To address these issues, we present GOcats, a novel tool that organizes the Gene Ontology (GO) into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. We tested GOcats performance using subcellular location categories to mine annotations from GO-utilizing knowledgebases and evaluated their accuracy against immunohistochemistry datasets in the Human Protein Atlas (HPA). In comparison to term categorizations generated from UniProt’s controlled vocabulary and from GO slims via OWLTools’ Map2Slim, GOcats outperformed these methods in its ability to mimic human-categorized GO term sets. Unlike the other methods, GOcats relies only on an input of basic keywords from the user (e.g. biologist), not a manually compiled or static set of top-level GO terms. Additionally, by identifying and properly defining relations with respect to semantic scope, GOcats can utilize the traditionally problematic relation, has_part, without encountering erroneous term mapping. We applied GOcats in the comparison of HPA-sourced knowledgebase annotations to experimentally-derived annotations provided by HPA directly. During the comparison, GOcats improved correspondence between the annotation sources by adjusting semantic granularity. GOcats enables the creation of custom, GO slim-like filters to map fine-grained gene annotations from gene annotation files to general subcellular compartments without needing to hand-select a set of GO terms for categorization. Moreover, GOcats can customize the level of semantic specificity for annotation categories. Furthermore, GOcats enables a safe and more comprehensive semantic scoping utilization of go-core, allowing for a more complete utilization of information available in GO. Together, these improvements can impact a variety of GO knowledgebase data mining use-cases as well as knowledgebase curation and quality control.
format Online
Article
Text
id pubmed-7289357
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-72893572020-06-15 GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts Hinderer, Eugene W. Moseley, Hunter N. B. PLoS One Research Article Gene Ontology is used extensively in scientific knowledgebases and repositories to organize a wealth of biological information. However, interpreting annotations derived from differential gene lists is often difficult without manually sorting into higher-order categories. To address these issues, we present GOcats, a novel tool that organizes the Gene Ontology (GO) into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. We tested GOcats performance using subcellular location categories to mine annotations from GO-utilizing knowledgebases and evaluated their accuracy against immunohistochemistry datasets in the Human Protein Atlas (HPA). In comparison to term categorizations generated from UniProt’s controlled vocabulary and from GO slims via OWLTools’ Map2Slim, GOcats outperformed these methods in its ability to mimic human-categorized GO term sets. Unlike the other methods, GOcats relies only on an input of basic keywords from the user (e.g. biologist), not a manually compiled or static set of top-level GO terms. Additionally, by identifying and properly defining relations with respect to semantic scope, GOcats can utilize the traditionally problematic relation, has_part, without encountering erroneous term mapping. We applied GOcats in the comparison of HPA-sourced knowledgebase annotations to experimentally-derived annotations provided by HPA directly. During the comparison, GOcats improved correspondence between the annotation sources by adjusting semantic granularity. GOcats enables the creation of custom, GO slim-like filters to map fine-grained gene annotations from gene annotation files to general subcellular compartments without needing to hand-select a set of GO terms for categorization. Moreover, GOcats can customize the level of semantic specificity for annotation categories. Furthermore, GOcats enables a safe and more comprehensive semantic scoping utilization of go-core, allowing for a more complete utilization of information available in GO. Together, these improvements can impact a variety of GO knowledgebase data mining use-cases as well as knowledgebase curation and quality control. Public Library of Science 2020-06-11 /pmc/articles/PMC7289357/ /pubmed/32525872 http://dx.doi.org/10.1371/journal.pone.0233311 Text en © 2020 Hinderer III, Moseley http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hinderer, Eugene W.
Moseley, Hunter N. B.
GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts
title GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts
title_full GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts
title_fullStr GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts
title_full_unstemmed GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts
title_short GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts
title_sort gocats: a tool for categorizing gene ontology into subgraphs of user-defined concepts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289357/
https://www.ncbi.nlm.nih.gov/pubmed/32525872
http://dx.doi.org/10.1371/journal.pone.0233311
work_keys_str_mv AT hinderereugenew gocatsatoolforcategorizinggeneontologyintosubgraphsofuserdefinedconcepts
AT moseleyhunternb gocatsatoolforcategorizinggeneontologyintosubgraphsofuserdefinedconcepts