Cargando…

Identification of research hypotheses and new knowledge from scientific literature

BACKGROUND: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events,...

Descripción completa

Detalles Bibliográficos
Autores principales: Shardlow, Matthew, Batista-Navarro, Riza, Thompson, Paul, Nawaz, Raheel, McNaught, John, Ananiadou, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019216/
https://www.ncbi.nlm.nih.gov/pubmed/29940927
http://dx.doi.org/10.1186/s12911-018-0639-1
_version_ 1783335079797325824
author Shardlow, Matthew
Batista-Navarro, Riza
Thompson, Paul
Nawaz, Raheel
McNaught, John
Ananiadou, Sophia
author_facet Shardlow, Matthew
Batista-Navarro, Riza
Thompson, Paul
Nawaz, Raheel
McNaught, John
Ananiadou, Sophia
author_sort Shardlow, Matthew
collection PubMed
description BACKGROUND: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author’s intended knowledge gain) and New Knowledge (an author’s findings). The method incorporates various features, including a combination of simple MK dimensions. METHODS: We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. RESULTS: We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). CONCLUSION: We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-018-0639-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6019216
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60192162018-07-06 Identification of research hypotheses and new knowledge from scientific literature Shardlow, Matthew Batista-Navarro, Riza Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia BMC Med Inform Decis Mak Research Article BACKGROUND: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author’s intended knowledge gain) and New Knowledge (an author’s findings). The method incorporates various features, including a combination of simple MK dimensions. METHODS: We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. RESULTS: We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). CONCLUSION: We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-018-0639-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-25 /pmc/articles/PMC6019216/ /pubmed/29940927 http://dx.doi.org/10.1186/s12911-018-0639-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Shardlow, Matthew
Batista-Navarro, Riza
Thompson, Paul
Nawaz, Raheel
McNaught, John
Ananiadou, Sophia
Identification of research hypotheses and new knowledge from scientific literature
title Identification of research hypotheses and new knowledge from scientific literature
title_full Identification of research hypotheses and new knowledge from scientific literature
title_fullStr Identification of research hypotheses and new knowledge from scientific literature
title_full_unstemmed Identification of research hypotheses and new knowledge from scientific literature
title_short Identification of research hypotheses and new knowledge from scientific literature
title_sort identification of research hypotheses and new knowledge from scientific literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019216/
https://www.ncbi.nlm.nih.gov/pubmed/29940927
http://dx.doi.org/10.1186/s12911-018-0639-1
work_keys_str_mv AT shardlowmatthew identificationofresearchhypothesesandnewknowledgefromscientificliterature
AT batistanavarroriza identificationofresearchhypothesesandnewknowledgefromscientificliterature
AT thompsonpaul identificationofresearchhypothesesandnewknowledgefromscientificliterature
AT nawazraheel identificationofresearchhypothesesandnewknowledgefromscientificliterature
AT mcnaughtjohn identificationofresearchhypothesesandnewknowledgefromscientificliterature
AT ananiadousophia identificationofresearchhypothesesandnewknowledgefromscientificliterature