Cargando…

Toward an interactive article: integrating journals and biological databases

BACKGROUND: Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the ri...

Descripción completa

Detalles Bibliográficos
Autores principales: Rangarajan, Arun, Schedl, Tim, Yook, Karen, Chan, Juancarlos, Haenel, Stephen, Otis, Lolly, Faelten, Sharon, DePellegrin-Connelly, Tracey, Isaacson, Ruth, Skrzypek, Marek S, Marygold, Steven J, Stefancsik , Raymund, Cherry, J Michael, Sternberg, Paul W, Müller, Hans-Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3213741/
https://www.ncbi.nlm.nih.gov/pubmed/21595960
http://dx.doi.org/10.1186/1471-2105-12-175
_version_ 1782216182720364544
author Rangarajan, Arun
Schedl, Tim
Yook, Karen
Chan, Juancarlos
Haenel, Stephen
Otis, Lolly
Faelten, Sharon
DePellegrin-Connelly, Tracey
Isaacson, Ruth
Skrzypek, Marek S
Marygold, Steven J
Stefancsik , Raymund
Cherry, J Michael
Sternberg, Paul W
Müller, Hans-Michael
author_facet Rangarajan, Arun
Schedl, Tim
Yook, Karen
Chan, Juancarlos
Haenel, Stephen
Otis, Lolly
Faelten, Sharon
DePellegrin-Connelly, Tracey
Isaacson, Ruth
Skrzypek, Marek S
Marygold, Steven J
Stefancsik , Raymund
Cherry, J Michael
Sternberg, Paul W
Müller, Hans-Michael
author_sort Rangarajan, Arun
collection PubMed
description BACKGROUND: Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture. RESULTS: We have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases. CONCLUSIONS: Our semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase. Our pipeline results in interactive articles that are data rich with high accuracy. The use of a manual quality control step sets this pipeline apart from other hyperlinking tools and results in benefits to authors, journals, readers and databases.
format Online
Article
Text
id pubmed-3213741
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32137412011-11-12 Toward an interactive article: integrating journals and biological databases Rangarajan, Arun Schedl, Tim Yook, Karen Chan, Juancarlos Haenel, Stephen Otis, Lolly Faelten, Sharon DePellegrin-Connelly, Tracey Isaacson, Ruth Skrzypek, Marek S Marygold, Steven J Stefancsik , Raymund Cherry, J Michael Sternberg, Paul W Müller, Hans-Michael BMC Bioinformatics Correspondence BACKGROUND: Journal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture. RESULTS: We have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases. CONCLUSIONS: Our semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase. Our pipeline results in interactive articles that are data rich with high accuracy. The use of a manual quality control step sets this pipeline apart from other hyperlinking tools and results in benefits to authors, journals, readers and databases. BioMed Central 2011-05-19 /pmc/articles/PMC3213741/ /pubmed/21595960 http://dx.doi.org/10.1186/1471-2105-12-175 Text en Copyright ©2011 Rangarajan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Correspondence
Rangarajan, Arun
Schedl, Tim
Yook, Karen
Chan, Juancarlos
Haenel, Stephen
Otis, Lolly
Faelten, Sharon
DePellegrin-Connelly, Tracey
Isaacson, Ruth
Skrzypek, Marek S
Marygold, Steven J
Stefancsik , Raymund
Cherry, J Michael
Sternberg, Paul W
Müller, Hans-Michael
Toward an interactive article: integrating journals and biological databases
title Toward an interactive article: integrating journals and biological databases
title_full Toward an interactive article: integrating journals and biological databases
title_fullStr Toward an interactive article: integrating journals and biological databases
title_full_unstemmed Toward an interactive article: integrating journals and biological databases
title_short Toward an interactive article: integrating journals and biological databases
title_sort toward an interactive article: integrating journals and biological databases
topic Correspondence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3213741/
https://www.ncbi.nlm.nih.gov/pubmed/21595960
http://dx.doi.org/10.1186/1471-2105-12-175
work_keys_str_mv AT rangarajanarun towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT schedltim towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT yookkaren towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT chanjuancarlos towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT haenelstephen towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT otislolly towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT faeltensharon towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT depellegrinconnellytracey towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT isaacsonruth towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT skrzypekmareks towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT marygoldstevenj towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT stefancsikraymund towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT cherryjmichael towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT sternbergpaulw towardaninteractivearticleintegratingjournalsandbiologicaldatabases
AT mullerhansmichael towardaninteractivearticleintegratingjournalsandbiologicaldatabases