Cargando…

neXtA(5): accelerating annotation of articles via automated approaches in neXtProt

The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA(5), which prioritizes the literature for specific curation require...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mottin, Luc, Gobeill, Julien, Pasche, Emilie, Michel, Pierre-André, Cusin, Isabelle, Gaudet, Pascale, Ruch, Patrick
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4930835/ https://www.ncbi.nlm.nih.gov/pubmed/27374119 http://dx.doi.org/10.1093/database/baw098

_version_	1782440793463586816
author	Mottin, Luc Gobeill, Julien Pasche, Emilie Michel, Pierre-André Cusin, Isabelle Gaudet, Pascale Ruch, Patrick
author_facet	Mottin, Luc Gobeill, Julien Pasche, Emilie Michel, Pierre-André Cusin, Isabelle Gaudet, Pascale Ruch, Patrick
author_sort	Mottin, Luc
collection	PubMed
description	The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA(5), which prioritizes the literature for specific curation requirements. Our system, neXtA(5), is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline. Available on: http://babar.unige.ch:8082/neXtA5 Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp
format	Online Article Text
id	pubmed-4930835
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-49308352016-07-05 neXtA(5): accelerating annotation of articles via automated approaches in neXtProt Mottin, Luc Gobeill, Julien Pasche, Emilie Michel, Pierre-André Cusin, Isabelle Gaudet, Pascale Ruch, Patrick Database (Oxford) Original Article The rapid increase in the number of published articles poses a challenge for curated databases to remain up-to-date. To help the scientific community and database curators deal with this issue, we have developed an application, neXtA(5), which prioritizes the literature for specific curation requirements. Our system, neXtA(5), is a curation service composed of three main elements. The first component is a named-entity recognition module, which annotates MEDLINE over some predefined axes. This report focuses on three axes: Diseases, the Molecular Function and Biological Process sub-ontologies of the Gene Ontology (GO). The automatic annotations are then stored in a local database, BioMed, for each annotation axis. Additional entities such as species and chemical compounds are also identified. The second component is an existing search engine, which retrieves the most relevant MEDLINE records for any given query. The third component uses the content of BioMed to generate an axis-specific ranking, which takes into account the density of named-entities as stored in the Biomed database. The two ranked lists are ultimately merged using a linear combination, which has been specifically tuned to support the annotation of each axis. The fine-tuning of the coefficients is formally reported for each axis-driven search. Compared with PubMed, which is the system used by most curators, the improvement is the following: +231% for Diseases, +236% for Molecular Functions and +3153% for Biological Process when measuring the precision of the top-returned PMID (P0 or mean reciprocal rank). The current search methods significantly improve the search effectiveness of curators for three important curation axes. Further experiments are being performed to extend the curation types, in particular protein–protein interactions, which require specific relationship extraction capabilities. In parallel, user-friendly interfaces powered with a set of JSON web services are currently being implemented into the neXtProt annotation pipeline. Available on: http://babar.unige.ch:8082/neXtA5 Database URL: http://babar.unige.ch:8082/neXtA5/fetcher.jsp Oxford University Press 2016-07-02 /pmc/articles/PMC4930835/ /pubmed/27374119 http://dx.doi.org/10.1093/database/baw098 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Mottin, Luc Gobeill, Julien Pasche, Emilie Michel, Pierre-André Cusin, Isabelle Gaudet, Pascale Ruch, Patrick neXtA(5): accelerating annotation of articles via automated approaches in neXtProt
title	neXtA(5): accelerating annotation of articles via automated approaches in neXtProt
title_full	neXtA(5): accelerating annotation of articles via automated approaches in neXtProt
title_fullStr	neXtA(5): accelerating annotation of articles via automated approaches in neXtProt
title_full_unstemmed	neXtA(5): accelerating annotation of articles via automated approaches in neXtProt
title_short	neXtA(5): accelerating annotation of articles via automated approaches in neXtProt
title_sort	nexta(5): accelerating annotation of articles via automated approaches in nextprot
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4930835/ https://www.ncbi.nlm.nih.gov/pubmed/27374119 http://dx.doi.org/10.1093/database/baw098
work_keys_str_mv	AT mottinluc nexta5acceleratingannotationofarticlesviaautomatedapproachesinnextprot AT gobeilljulien nexta5acceleratingannotationofarticlesviaautomatedapproachesinnextprot AT pascheemilie nexta5acceleratingannotationofarticlesviaautomatedapproachesinnextprot AT michelpierreandre nexta5acceleratingannotationofarticlesviaautomatedapproachesinnextprot AT cusinisabelle nexta5acceleratingannotationofarticlesviaautomatedapproachesinnextprot AT gaudetpascale nexta5acceleratingannotationofarticlesviaautomatedapproachesinnextprot AT ruchpatrick nexta5acceleratingannotationofarticlesviaautomatedapproachesinnextprot

neXtA(5): accelerating annotation of articles via automated approaches in neXtProt

Ejemplares similares