Cargando…

Developing a biocuration workflow for AgBase, a non-model organism database

AgBase provides annotation for agricultural gene products using the Gene Ontology (GO) and Plant Ontology, as appropriate. Unlike model organism species, agricultural species have a body of literature that does not just focus on gene function; to improve efficiency, we use text mining to identify li...

Descripción completa

Detalles Bibliográficos
Autores principales: Pillai, Lakshmi, Chouvarine, Philippe, Tudor, Catalina O., Schmidt, Carl J., Vijay-Shanker, K., McCarthy, Fiona M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500517/
https://www.ncbi.nlm.nih.gov/pubmed/23160411
http://dx.doi.org/10.1093/database/bas038
_version_ 1782250116045864960
author Pillai, Lakshmi
Chouvarine, Philippe
Tudor, Catalina O.
Schmidt, Carl J.
Vijay-Shanker, K.
McCarthy, Fiona M.
author_facet Pillai, Lakshmi
Chouvarine, Philippe
Tudor, Catalina O.
Schmidt, Carl J.
Vijay-Shanker, K.
McCarthy, Fiona M.
author_sort Pillai, Lakshmi
collection PubMed
description AgBase provides annotation for agricultural gene products using the Gene Ontology (GO) and Plant Ontology, as appropriate. Unlike model organism species, agricultural species have a body of literature that does not just focus on gene function; to improve efficiency, we use text mining to identify literature for curation. The first component of our annotation interface is the gene prioritization interface that ranks gene products for annotation. Biocurators select the top-ranked gene and mark annotation for these genes as ‘in progress’ or ‘completed’; links enable biocurators to move directly to our biocuration interface (BI). Our BI includes all current GO annotation for gene products and is the main interface to add/modify AgBase curation data. The BI also displays Extracting Genic Information from Text (eGIFT) results for each gene product. eGIFT is a web-based, text-mining tool that associates ranked, informative terms (iTerms) and the articles and sentences containing them, with genes. Moreover, iTerms are linked to GO terms, where they match either a GO term name or a synonym. This enables AgBase biocurators to rapidly identify literature for further curation based on possible GO terms. Because most agricultural species do not have standardized literature, eGIFT searches all gene names and synonyms to associate articles with genes. As many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene, and filtering is applied to remove abstracts that mention a gene in passing. The BI is linked to our Journal Database (JDB) where corresponding journal citations are stored. Just as importantly, biocurators also add to the JDB citations that have no GO annotation. The AgBase BI also supports bulk annotation upload to facilitate our Inferred from electronic annotation of agricultural gene products. All annotations must pass standard GO Consortium quality checking before release in AgBase. Database URL: http://www.agbase.msstate.edu/
format Online
Article
Text
id pubmed-3500517
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35005172012-11-19 Developing a biocuration workflow for AgBase, a non-model organism database Pillai, Lakshmi Chouvarine, Philippe Tudor, Catalina O. Schmidt, Carl J. Vijay-Shanker, K. McCarthy, Fiona M. Database (Oxford) BioCreative Virtual Issue AgBase provides annotation for agricultural gene products using the Gene Ontology (GO) and Plant Ontology, as appropriate. Unlike model organism species, agricultural species have a body of literature that does not just focus on gene function; to improve efficiency, we use text mining to identify literature for curation. The first component of our annotation interface is the gene prioritization interface that ranks gene products for annotation. Biocurators select the top-ranked gene and mark annotation for these genes as ‘in progress’ or ‘completed’; links enable biocurators to move directly to our biocuration interface (BI). Our BI includes all current GO annotation for gene products and is the main interface to add/modify AgBase curation data. The BI also displays Extracting Genic Information from Text (eGIFT) results for each gene product. eGIFT is a web-based, text-mining tool that associates ranked, informative terms (iTerms) and the articles and sentences containing them, with genes. Moreover, iTerms are linked to GO terms, where they match either a GO term name or a synonym. This enables AgBase biocurators to rapidly identify literature for further curation based on possible GO terms. Because most agricultural species do not have standardized literature, eGIFT searches all gene names and synonyms to associate articles with genes. As many of the gene names can be ambiguous, eGIFT applies a disambiguation step to remove matches that do not correspond to this gene, and filtering is applied to remove abstracts that mention a gene in passing. The BI is linked to our Journal Database (JDB) where corresponding journal citations are stored. Just as importantly, biocurators also add to the JDB citations that have no GO annotation. The AgBase BI also supports bulk annotation upload to facilitate our Inferred from electronic annotation of agricultural gene products. All annotations must pass standard GO Consortium quality checking before release in AgBase. Database URL: http://www.agbase.msstate.edu/ Oxford University Press 2012-11-15 /pmc/articles/PMC3500517/ /pubmed/23160411 http://dx.doi.org/10.1093/database/bas038 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle BioCreative Virtual Issue
Pillai, Lakshmi
Chouvarine, Philippe
Tudor, Catalina O.
Schmidt, Carl J.
Vijay-Shanker, K.
McCarthy, Fiona M.
Developing a biocuration workflow for AgBase, a non-model organism database
title Developing a biocuration workflow for AgBase, a non-model organism database
title_full Developing a biocuration workflow for AgBase, a non-model organism database
title_fullStr Developing a biocuration workflow for AgBase, a non-model organism database
title_full_unstemmed Developing a biocuration workflow for AgBase, a non-model organism database
title_short Developing a biocuration workflow for AgBase, a non-model organism database
title_sort developing a biocuration workflow for agbase, a non-model organism database
topic BioCreative Virtual Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500517/
https://www.ncbi.nlm.nih.gov/pubmed/23160411
http://dx.doi.org/10.1093/database/bas038
work_keys_str_mv AT pillailakshmi developingabiocurationworkflowforagbaseanonmodelorganismdatabase
AT chouvarinephilippe developingabiocurationworkflowforagbaseanonmodelorganismdatabase
AT tudorcatalinao developingabiocurationworkflowforagbaseanonmodelorganismdatabase
AT schmidtcarlj developingabiocurationworkflowforagbaseanonmodelorganismdatabase
AT vijayshankerk developingabiocurationworkflowforagbaseanonmodelorganismdatabase
AT mccarthyfionam developingabiocurationworkflowforagbaseanonmodelorganismdatabase