Cargando…

Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. E...

Descripción completa

Detalles Bibliográficos
Autores principales: Van Auken, Kimberly, Fey, Petra, Berardini, Tanya Z., Dodson, Robert, Cooper, Laurel, Li, Donghui, Chan, Juancarlos, Li, Yuling, Basu, Siddhartha, Muller, Hans-Michael, Chisholm, Rex, Huala, Eva, Sternberg, Paul W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500519/
https://www.ncbi.nlm.nih.gov/pubmed/23160413
http://dx.doi.org/10.1093/database/bas040
Descripción
Sumario:WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.