Cargando…

Exploring automatic inconsistency detection for literature-based gene ontology annotation

MOTIVATION: Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. How...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Jiyu, Goudey, Benjamin, Zobel, Justin, Geard, Nicholas, Verspoor, Karin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235499/
https://www.ncbi.nlm.nih.gov/pubmed/35758780
http://dx.doi.org/10.1093/bioinformatics/btac230
_version_ 1784736325061050368
author Chen, Jiyu
Goudey, Benjamin
Zobel, Justin
Geard, Nicholas
Verspoor, Karin
author_facet Chen, Jiyu
Goudey, Benjamin
Zobel, Justin
Geard, Nicholas
Verspoor, Karin
author_sort Chen, Jiyu
collection PubMed
description MOTIVATION: Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. RESULTS: We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.
format Online
Article
Text
id pubmed-9235499
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92354992022-06-29 Exploring automatic inconsistency detection for literature-based gene ontology annotation Chen, Jiyu Goudey, Benjamin Zobel, Justin Geard, Nicholas Verspoor, Karin Bioinformatics ISCB/Ismb 2022 MOTIVATION: Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. RESULTS: We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency. Oxford University Press 2022-06-27 /pmc/articles/PMC9235499/ /pubmed/35758780 http://dx.doi.org/10.1093/bioinformatics/btac230 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle ISCB/Ismb 2022
Chen, Jiyu
Goudey, Benjamin
Zobel, Justin
Geard, Nicholas
Verspoor, Karin
Exploring automatic inconsistency detection for literature-based gene ontology annotation
title Exploring automatic inconsistency detection for literature-based gene ontology annotation
title_full Exploring automatic inconsistency detection for literature-based gene ontology annotation
title_fullStr Exploring automatic inconsistency detection for literature-based gene ontology annotation
title_full_unstemmed Exploring automatic inconsistency detection for literature-based gene ontology annotation
title_short Exploring automatic inconsistency detection for literature-based gene ontology annotation
title_sort exploring automatic inconsistency detection for literature-based gene ontology annotation
topic ISCB/Ismb 2022
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235499/
https://www.ncbi.nlm.nih.gov/pubmed/35758780
http://dx.doi.org/10.1093/bioinformatics/btac230
work_keys_str_mv AT chenjiyu exploringautomaticinconsistencydetectionforliteraturebasedgeneontologyannotation
AT goudeybenjamin exploringautomaticinconsistencydetectionforliteraturebasedgeneontologyannotation
AT zobeljustin exploringautomaticinconsistencydetectionforliteraturebasedgeneontologyannotation
AT geardnicholas exploringautomaticinconsistencydetectionforliteraturebasedgeneontologyannotation
AT verspoorkarin exploringautomaticinconsistencydetectionforliteraturebasedgeneontologyannotation