Cargando…
Identification of research hypotheses and new knowledge from scientific literature
BACKGROUND: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events,...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019216/ https://www.ncbi.nlm.nih.gov/pubmed/29940927 http://dx.doi.org/10.1186/s12911-018-0639-1 |
_version_ | 1783335079797325824 |
---|---|
author | Shardlow, Matthew Batista-Navarro, Riza Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia |
author_facet | Shardlow, Matthew Batista-Navarro, Riza Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia |
author_sort | Shardlow, Matthew |
collection | PubMed |
description | BACKGROUND: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author’s intended knowledge gain) and New Knowledge (an author’s findings). The method incorporates various features, including a combination of simple MK dimensions. METHODS: We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. RESULTS: We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). CONCLUSION: We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-018-0639-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6019216 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-60192162018-07-06 Identification of research hypotheses and new knowledge from scientific literature Shardlow, Matthew Batista-Navarro, Riza Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia BMC Med Inform Decis Mak Research Article BACKGROUND: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author’s intended knowledge gain) and New Knowledge (an author’s findings). The method incorporates various features, including a combination of simple MK dimensions. METHODS: We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated. RESULTS: We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836). CONCLUSION: We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-018-0639-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-25 /pmc/articles/PMC6019216/ /pubmed/29940927 http://dx.doi.org/10.1186/s12911-018-0639-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Shardlow, Matthew Batista-Navarro, Riza Thompson, Paul Nawaz, Raheel McNaught, John Ananiadou, Sophia Identification of research hypotheses and new knowledge from scientific literature |
title | Identification of research hypotheses and new knowledge from scientific literature |
title_full | Identification of research hypotheses and new knowledge from scientific literature |
title_fullStr | Identification of research hypotheses and new knowledge from scientific literature |
title_full_unstemmed | Identification of research hypotheses and new knowledge from scientific literature |
title_short | Identification of research hypotheses and new knowledge from scientific literature |
title_sort | identification of research hypotheses and new knowledge from scientific literature |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6019216/ https://www.ncbi.nlm.nih.gov/pubmed/29940927 http://dx.doi.org/10.1186/s12911-018-0639-1 |
work_keys_str_mv | AT shardlowmatthew identificationofresearchhypothesesandnewknowledgefromscientificliterature AT batistanavarroriza identificationofresearchhypothesesandnewknowledgefromscientificliterature AT thompsonpaul identificationofresearchhypothesesandnewknowledgefromscientificliterature AT nawazraheel identificationofresearchhypothesesandnewknowledgefromscientificliterature AT mcnaughtjohn identificationofresearchhypothesesandnewknowledgefromscientificliterature AT ananiadousophia identificationofresearchhypothesesandnewknowledgefromscientificliterature |