Cargando…
GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts
Gene Ontology is used extensively in scientific knowledgebases and repositories to organize a wealth of biological information. However, interpreting annotations derived from differential gene lists is often difficult without manually sorting into higher-order categories. To address these issues, we...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289357/ https://www.ncbi.nlm.nih.gov/pubmed/32525872 http://dx.doi.org/10.1371/journal.pone.0233311 |
_version_ | 1783545445409095680 |
---|---|
author | Hinderer, Eugene W. Moseley, Hunter N. B. |
author_facet | Hinderer, Eugene W. Moseley, Hunter N. B. |
author_sort | Hinderer, Eugene W. |
collection | PubMed |
description | Gene Ontology is used extensively in scientific knowledgebases and repositories to organize a wealth of biological information. However, interpreting annotations derived from differential gene lists is often difficult without manually sorting into higher-order categories. To address these issues, we present GOcats, a novel tool that organizes the Gene Ontology (GO) into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. We tested GOcats performance using subcellular location categories to mine annotations from GO-utilizing knowledgebases and evaluated their accuracy against immunohistochemistry datasets in the Human Protein Atlas (HPA). In comparison to term categorizations generated from UniProt’s controlled vocabulary and from GO slims via OWLTools’ Map2Slim, GOcats outperformed these methods in its ability to mimic human-categorized GO term sets. Unlike the other methods, GOcats relies only on an input of basic keywords from the user (e.g. biologist), not a manually compiled or static set of top-level GO terms. Additionally, by identifying and properly defining relations with respect to semantic scope, GOcats can utilize the traditionally problematic relation, has_part, without encountering erroneous term mapping. We applied GOcats in the comparison of HPA-sourced knowledgebase annotations to experimentally-derived annotations provided by HPA directly. During the comparison, GOcats improved correspondence between the annotation sources by adjusting semantic granularity. GOcats enables the creation of custom, GO slim-like filters to map fine-grained gene annotations from gene annotation files to general subcellular compartments without needing to hand-select a set of GO terms for categorization. Moreover, GOcats can customize the level of semantic specificity for annotation categories. Furthermore, GOcats enables a safe and more comprehensive semantic scoping utilization of go-core, allowing for a more complete utilization of information available in GO. Together, these improvements can impact a variety of GO knowledgebase data mining use-cases as well as knowledgebase curation and quality control. |
format | Online Article Text |
id | pubmed-7289357 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-72893572020-06-15 GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts Hinderer, Eugene W. Moseley, Hunter N. B. PLoS One Research Article Gene Ontology is used extensively in scientific knowledgebases and repositories to organize a wealth of biological information. However, interpreting annotations derived from differential gene lists is often difficult without manually sorting into higher-order categories. To address these issues, we present GOcats, a novel tool that organizes the Gene Ontology (GO) into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. We tested GOcats performance using subcellular location categories to mine annotations from GO-utilizing knowledgebases and evaluated their accuracy against immunohistochemistry datasets in the Human Protein Atlas (HPA). In comparison to term categorizations generated from UniProt’s controlled vocabulary and from GO slims via OWLTools’ Map2Slim, GOcats outperformed these methods in its ability to mimic human-categorized GO term sets. Unlike the other methods, GOcats relies only on an input of basic keywords from the user (e.g. biologist), not a manually compiled or static set of top-level GO terms. Additionally, by identifying and properly defining relations with respect to semantic scope, GOcats can utilize the traditionally problematic relation, has_part, without encountering erroneous term mapping. We applied GOcats in the comparison of HPA-sourced knowledgebase annotations to experimentally-derived annotations provided by HPA directly. During the comparison, GOcats improved correspondence between the annotation sources by adjusting semantic granularity. GOcats enables the creation of custom, GO slim-like filters to map fine-grained gene annotations from gene annotation files to general subcellular compartments without needing to hand-select a set of GO terms for categorization. Moreover, GOcats can customize the level of semantic specificity for annotation categories. Furthermore, GOcats enables a safe and more comprehensive semantic scoping utilization of go-core, allowing for a more complete utilization of information available in GO. Together, these improvements can impact a variety of GO knowledgebase data mining use-cases as well as knowledgebase curation and quality control. Public Library of Science 2020-06-11 /pmc/articles/PMC7289357/ /pubmed/32525872 http://dx.doi.org/10.1371/journal.pone.0233311 Text en © 2020 Hinderer III, Moseley http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Hinderer, Eugene W. Moseley, Hunter N. B. GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts |
title | GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts |
title_full | GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts |
title_fullStr | GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts |
title_full_unstemmed | GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts |
title_short | GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts |
title_sort | gocats: a tool for categorizing gene ontology into subgraphs of user-defined concepts |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289357/ https://www.ncbi.nlm.nih.gov/pubmed/32525872 http://dx.doi.org/10.1371/journal.pone.0233311 |
work_keys_str_mv | AT hinderereugenew gocatsatoolforcategorizinggeneontologyintosubgraphsofuserdefinedconcepts AT moseleyhunternb gocatsatoolforcategorizinggeneontologyintosubgraphsofuserdefinedconcepts |