Cargando…

Assisting document triage for human kinome curation via machine learning

In the era of data explosion, the increasing frequency of published articles presents unorthodox challenges to fulfill specific curation requirements for bio-literature databases. Recognizing these demands, we designed a document triage system with automatic methods that can improve efficiency to re...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsu, Yi-Yu, Wei, Chih-Hsuan, Lu, Zhiyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146134/
https://www.ncbi.nlm.nih.gov/pubmed/30239677
http://dx.doi.org/10.1093/database/bay091
_version_ 1783356347620786176
author Hsu, Yi-Yu
Wei, Chih-Hsuan
Lu, Zhiyong
author_facet Hsu, Yi-Yu
Wei, Chih-Hsuan
Lu, Zhiyong
author_sort Hsu, Yi-Yu
collection PubMed
description In the era of data explosion, the increasing frequency of published articles presents unorthodox challenges to fulfill specific curation requirements for bio-literature databases. Recognizing these demands, we designed a document triage system with automatic methods that can improve efficiency to retrieve the most relevant articles in curation workflows and reduce workloads for biocurators. Since the BioCreative VI (2017), we have implemented texting mining processing in our system in hopes of providing higher effectiveness for curating articles related to human kinase proteins. We tested several machine learning methods together with state-of-the-art concept extraction tools. For features, we extracted rich co-occurrence and linguistic information to model the curation process of human kinome articles by the neXtProt database. As shown in the official evaluation on the human kinome curation task in BioCreative VI, our system can effectively retrieve 5.2 and 6.5 kinase articles with the relevant disease (DIS) and biological process (BP) information, respectively, among the top 100 returned results. Comparing to neXtA5, our system demonstrates significant improvements in prioritizing kinome-related articles as follows: our system achieves 0.458 and 0.109 for the DIS axis whereas the neXtA5’s best-reported mean average precision (MAP) and maximum precision observed are 0.41 and 0.04. Our system also outperforms the neXtA5 in retrieving BP axis with 0.195 for MAP and the neXtA5’s reported value was 0.11. These results suggest that our system may be able to assist neXtProt biocurators in practice.
format Online
Article
Text
id pubmed-6146134
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61461342018-09-25 Assisting document triage for human kinome curation via machine learning Hsu, Yi-Yu Wei, Chih-Hsuan Lu, Zhiyong Database (Oxford) Original Article In the era of data explosion, the increasing frequency of published articles presents unorthodox challenges to fulfill specific curation requirements for bio-literature databases. Recognizing these demands, we designed a document triage system with automatic methods that can improve efficiency to retrieve the most relevant articles in curation workflows and reduce workloads for biocurators. Since the BioCreative VI (2017), we have implemented texting mining processing in our system in hopes of providing higher effectiveness for curating articles related to human kinase proteins. We tested several machine learning methods together with state-of-the-art concept extraction tools. For features, we extracted rich co-occurrence and linguistic information to model the curation process of human kinome articles by the neXtProt database. As shown in the official evaluation on the human kinome curation task in BioCreative VI, our system can effectively retrieve 5.2 and 6.5 kinase articles with the relevant disease (DIS) and biological process (BP) information, respectively, among the top 100 returned results. Comparing to neXtA5, our system demonstrates significant improvements in prioritizing kinome-related articles as follows: our system achieves 0.458 and 0.109 for the DIS axis whereas the neXtA5’s best-reported mean average precision (MAP) and maximum precision observed are 0.41 and 0.04. Our system also outperforms the neXtA5 in retrieving BP axis with 0.195 for MAP and the neXtA5’s reported value was 0.11. These results suggest that our system may be able to assist neXtProt biocurators in practice. Oxford University Press 2018-09-18 /pmc/articles/PMC6146134/ /pubmed/30239677 http://dx.doi.org/10.1093/database/bay091 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Hsu, Yi-Yu
Wei, Chih-Hsuan
Lu, Zhiyong
Assisting document triage for human kinome curation via machine learning
title Assisting document triage for human kinome curation via machine learning
title_full Assisting document triage for human kinome curation via machine learning
title_fullStr Assisting document triage for human kinome curation via machine learning
title_full_unstemmed Assisting document triage for human kinome curation via machine learning
title_short Assisting document triage for human kinome curation via machine learning
title_sort assisting document triage for human kinome curation via machine learning
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6146134/
https://www.ncbi.nlm.nih.gov/pubmed/30239677
http://dx.doi.org/10.1093/database/bay091
work_keys_str_mv AT hsuyiyu assistingdocumenttriageforhumankinomecurationviamachinelearning
AT weichihhsuan assistingdocumenttriageforhumankinomecurationviamachinelearning
AT luzhiyong assistingdocumenttriageforhumankinomecurationviamachinelearning