Cargando…
Accelerating annotation of articles via automated approaches: evaluation of the neXtA(5) curation-support tool by neXtProt
The development of efficient text-mining tools promises to boost the curation workflow by significantly reducing the time needed to process the literature into biological databases. We have developed a curation support tool, neXtA(5), that provides a search engine coupled with an annotation system d...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6301339/ https://www.ncbi.nlm.nih.gov/pubmed/30576492 http://dx.doi.org/10.1093/database/bay129 |
Sumario: | The development of efficient text-mining tools promises to boost the curation workflow by significantly reducing the time needed to process the literature into biological databases. We have developed a curation support tool, neXtA(5), that provides a search engine coupled with an annotation system directly integrated into a biocuration workflow. neXtA(5) assists curation with modules optimized for the thevarious curation tasks: document triage, entity recognition and information extraction. Here, we describe the evaluation of neXtA(5) by expert curators. We first assessed the annotations of two independent curators to provide a baseline for comparison. To evaluate the performance of neXtA(5), we submitted requests and compared the neXtA(5) results with the manual curation. The analysis focuses on the usability of neXtA(5) to support the curation of two types of data: biological processes (BPs) and diseases (Ds). We evaluated the relevance of the papers proposed as well as the recall and precision of the suggested annotations. The evaluation of document triage by neXtA(5) precision showed that both curators agree with neXtA(5) for 67 (BP) and 63% (D) of abstracts, while curators agree on accepting or rejecting an abstract ~80% of the time. Hence, the precision of the triage system is satisfactory. For concept extraction, curators approved 35 (BP) and 25% (D) of the neXtA(5) annotations. Conversely, neXtA(5) successfully annotated up to 36 (BP) and 68% (D) of the terms identified by curators. The user feedback obtained in these tests highlighted the need for improvement in the ranking function of neXtA(5) annotations. Therefore, we transformed the information extraction component into an annotation ranking system. This improvement results in a top precision (precision at first rank) of 59 (D) and 63% (BP). These results suggest that when considering only the first extracted entity, the current system achieves a precision comparable with expert biocurators. |
---|