Cargando…
Automatic consistency assurance for literature-based gene ontology annotation
BACKGROUND: Literature-based gene ontology (GO) annotation is a process where expert curators use uniform expressions to describe gene functions reported in research papers, creating computable representations of information about biological systems. Manual assurance of consistency between GO annota...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8620237/ https://www.ncbi.nlm.nih.gov/pubmed/34823464 http://dx.doi.org/10.1186/s12859-021-04479-9 |
_version_ | 1784605171267928064 |
---|---|
author | Chen, Jiyu Geard, Nicholas Zobel, Justin Verspoor, Karin |
author_facet | Chen, Jiyu Geard, Nicholas Zobel, Justin Verspoor, Karin |
author_sort | Chen, Jiyu |
collection | PubMed |
description | BACKGROUND: Literature-based gene ontology (GO) annotation is a process where expert curators use uniform expressions to describe gene functions reported in research papers, creating computable representations of information about biological systems. Manual assurance of consistency between GO annotations and the associated evidence texts identified by expert curators is reliable but time-consuming, and is infeasible in the context of rapidly growing biological literature. A key challenge is maintaining consistency of existing GO annotations as new studies are published and the GO vocabulary is updated. RESULTS: In this work, we introduce a formalisation of biological database annotation inconsistencies, identifying four distinct types of inconsistency. We propose a novel and efficient method using state-of-the-art text mining models to automatically distinguish between consistent GO annotation and the different types of inconsistent GO annotation. We evaluate this method using a synthetic dataset generated by directed manipulation of instances in an existing corpus, BC4GO. We provide detailed error analysis for demonstrating that the method achieves high precision on more confident predictions. CONCLUSIONS: Two models built using our method for distinct annotation consistency identification tasks achieved high precision and were robust to updates in the GO vocabulary. Our approach demonstrates clear value for human-in-the-loop curation scenarios. |
format | Online Article Text |
id | pubmed-8620237 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-86202372021-11-29 Automatic consistency assurance for literature-based gene ontology annotation Chen, Jiyu Geard, Nicholas Zobel, Justin Verspoor, Karin BMC Bioinformatics Methodology Article BACKGROUND: Literature-based gene ontology (GO) annotation is a process where expert curators use uniform expressions to describe gene functions reported in research papers, creating computable representations of information about biological systems. Manual assurance of consistency between GO annotations and the associated evidence texts identified by expert curators is reliable but time-consuming, and is infeasible in the context of rapidly growing biological literature. A key challenge is maintaining consistency of existing GO annotations as new studies are published and the GO vocabulary is updated. RESULTS: In this work, we introduce a formalisation of biological database annotation inconsistencies, identifying four distinct types of inconsistency. We propose a novel and efficient method using state-of-the-art text mining models to automatically distinguish between consistent GO annotation and the different types of inconsistent GO annotation. We evaluate this method using a synthetic dataset generated by directed manipulation of instances in an existing corpus, BC4GO. We provide detailed error analysis for demonstrating that the method achieves high precision on more confident predictions. CONCLUSIONS: Two models built using our method for distinct annotation consistency identification tasks achieved high precision and were robust to updates in the GO vocabulary. Our approach demonstrates clear value for human-in-the-loop curation scenarios. BioMed Central 2021-11-25 /pmc/articles/PMC8620237/ /pubmed/34823464 http://dx.doi.org/10.1186/s12859-021-04479-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Chen, Jiyu Geard, Nicholas Zobel, Justin Verspoor, Karin Automatic consistency assurance for literature-based gene ontology annotation |
title | Automatic consistency assurance for literature-based gene ontology annotation |
title_full | Automatic consistency assurance for literature-based gene ontology annotation |
title_fullStr | Automatic consistency assurance for literature-based gene ontology annotation |
title_full_unstemmed | Automatic consistency assurance for literature-based gene ontology annotation |
title_short | Automatic consistency assurance for literature-based gene ontology annotation |
title_sort | automatic consistency assurance for literature-based gene ontology annotation |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8620237/ https://www.ncbi.nlm.nih.gov/pubmed/34823464 http://dx.doi.org/10.1186/s12859-021-04479-9 |
work_keys_str_mv | AT chenjiyu automaticconsistencyassuranceforliteraturebasedgeneontologyannotation AT geardnicholas automaticconsistencyassuranceforliteraturebasedgeneontologyannotation AT zobeljustin automaticconsistencyassuranceforliteraturebasedgeneontologyannotation AT verspoorkarin automaticconsistencyassuranceforliteraturebasedgeneontologyannotation |