Cargando…

A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience

The curation of neuroscience entities is crucial to ongoing efforts in neuroinformatics and computational neuroscience, such as those being deployed in the context of continuing large-scale brain modelling projects. However, manually sifting through thousands of articles for new information about mo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shardlow, Matthew, Ju, Meizhi, Li, Maolin, O’Reilly, Christian, Iavarone, Elisabetta, McNaught, John, Ananiadou, Sophia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2018
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6594987/ https://www.ncbi.nlm.nih.gov/pubmed/30443819 http://dx.doi.org/10.1007/s12021-018-9404-y

_version_	1783430328440848384
author	Shardlow, Matthew Ju, Meizhi Li, Maolin O’Reilly, Christian Iavarone, Elisabetta McNaught, John Ananiadou, Sophia
author_facet	Shardlow, Matthew Ju, Meizhi Li, Maolin O’Reilly, Christian Iavarone, Elisabetta McNaught, John Ananiadou, Sophia
author_sort	Shardlow, Matthew
collection	PubMed
description	The curation of neuroscience entities is crucial to ongoing efforts in neuroinformatics and computational neuroscience, such as those being deployed in the context of continuing large-scale brain modelling projects. However, manually sifting through thousands of articles for new information about modelled entities is a painstaking and low-reward task. Text mining can be used to help a curator extract relevant information from this literature in a systematic way. We propose the application of text mining methods for the neuroscience literature. Specifically, two computational neuroscientists annotated a corpus of entities pertinent to neuroscience using active learning techniques to enable swift, targeted annotation. We then trained machine learning models to recognise the entities that have been identified. The entities covered are Neuron Types, Brain Regions, Experimental Values, Units, Ion Currents, Channels, and Conductances and Model organisms. We tested a traditional rule-based approach, a conditional random field and a model using deep learning named entity recognition, finding that the deep learning model was superior. Our final results show that we can detect a range of named entities of interest to the neuroscientist with a macro average precision, recall and F1 score of 0.866, 0.817 and 0.837 respectively. The contributions of this work are as follows: 1) We provide a set of Named Entity Recognition (NER) tools that are capable of detecting neuroscience entities with performance above or similar to prior work. 2) We propose a methodology for training NER tools for neuroscience that requires very little training data to get strong performance. This can be adapted for any sub-domain within neuroscience. 3) We provide a small corpus with annotations for multiple entity types, as well as annotation guidelines to help others reproduce our experiments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s12021-018-9404-y) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6594987
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-65949872019-07-11 A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience Shardlow, Matthew Ju, Meizhi Li, Maolin O’Reilly, Christian Iavarone, Elisabetta McNaught, John Ananiadou, Sophia Neuroinformatics Original Article The curation of neuroscience entities is crucial to ongoing efforts in neuroinformatics and computational neuroscience, such as those being deployed in the context of continuing large-scale brain modelling projects. However, manually sifting through thousands of articles for new information about modelled entities is a painstaking and low-reward task. Text mining can be used to help a curator extract relevant information from this literature in a systematic way. We propose the application of text mining methods for the neuroscience literature. Specifically, two computational neuroscientists annotated a corpus of entities pertinent to neuroscience using active learning techniques to enable swift, targeted annotation. We then trained machine learning models to recognise the entities that have been identified. The entities covered are Neuron Types, Brain Regions, Experimental Values, Units, Ion Currents, Channels, and Conductances and Model organisms. We tested a traditional rule-based approach, a conditional random field and a model using deep learning named entity recognition, finding that the deep learning model was superior. Our final results show that we can detect a range of named entities of interest to the neuroscientist with a macro average precision, recall and F1 score of 0.866, 0.817 and 0.837 respectively. The contributions of this work are as follows: 1) We provide a set of Named Entity Recognition (NER) tools that are capable of detecting neuroscience entities with performance above or similar to prior work. 2) We propose a methodology for training NER tools for neuroscience that requires very little training data to get strong performance. This can be adapted for any sub-domain within neuroscience. 3) We provide a small corpus with annotations for multiple entity types, as well as annotation guidelines to help others reproduce our experiments. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s12021-018-9404-y) contains supplementary material, which is available to authorized users. Springer US 2018-11-15 2019 /pmc/articles/PMC6594987/ /pubmed/30443819 http://dx.doi.org/10.1007/s12021-018-9404-y Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Original Article Shardlow, Matthew Ju, Meizhi Li, Maolin O’Reilly, Christian Iavarone, Elisabetta McNaught, John Ananiadou, Sophia A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience
title	A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience
title_full	A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience
title_fullStr	A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience
title_full_unstemmed	A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience
title_short	A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience
title_sort	text mining pipeline using active and deep learning aimed at curating information in computational neuroscience
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6594987/ https://www.ncbi.nlm.nih.gov/pubmed/30443819 http://dx.doi.org/10.1007/s12021-018-9404-y
work_keys_str_mv	AT shardlowmatthew atextminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT jumeizhi atextminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT limaolin atextminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT oreillychristian atextminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT iavaroneelisabetta atextminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT mcnaughtjohn atextminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT ananiadousophia atextminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT shardlowmatthew textminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT jumeizhi textminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT limaolin textminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT oreillychristian textminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT iavaroneelisabetta textminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT mcnaughtjohn textminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience AT ananiadousophia textminingpipelineusingactiveanddeeplearningaimedatcuratinginformationincomputationalneuroscience

A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience

Ejemplares similares