Cargando…
A corpus of plant–disease relations in the biomedical domain
BACKGROUND: Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in M...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6713337/ https://www.ncbi.nlm.nih.gov/pubmed/31461491 http://dx.doi.org/10.1371/journal.pone.0221582 |
_version_ | 1783446855126876160 |
---|---|
author | Kim, Baeksoo Choi, Wonjun Lee, Hyunju |
author_facet | Kim, Baeksoo Choi, Wonjun Lee, Hyunju |
author_sort | Kim, Baeksoo |
collection | PubMed |
description | BACKGROUND: Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in Medline articles. Because numerous articles are available in Medline and are written in natural language, text-mining is important. However, a corpus of plant and disease relations is not available yet. Thus, we aimed to construct such a corpus. METHODS AND RESULTS: In this study, we designed and annotated a plant–disease relations corpus, and proposed a computational model to predict plant–disease relations using the corpus. We categorized plant and disease relations into four types: treatments of diseases, causes of diseases, associations, and negative relations. To construct a corpus of plant–disease relations, we first created its annotation guidelines and randomly selected 200 Medline abstracts. From these abstracts, we identified 1,405 and 1,755 plant and disease mentions, annotated to 105 and 237 unique plant and disease identifiers, respectively. When we selected sentences containing at least one plant and one disease mention, we extracted 878 plant and 1,077 disease entities, which finally generated a corpus of plant-disease relations including 1,309 relations from 199 abstracts. To verify the effectiveness of the corpus, we proposed a convolutional neural network model with the shortest dependency path (SDP-CNN) and applied it to the constructed corpus. The micro F-score with ten-fold cross-validation was found to be 0.764. We also applied the proposed SDP-CNN model to all Medline abstracts. When we measured its performance for 483 randomly selected plant-disease co-occurring sentences, the model showed a precision of 0.707. CONCLUSION: The plant–disease relations corpus is unique and represents an important resource for biomedical text-mining. The corpus of plant and disease relations is available at http://gcancer.org/pdr/. |
format | Online Article Text |
id | pubmed-6713337 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-67133372019-09-04 A corpus of plant–disease relations in the biomedical domain Kim, Baeksoo Choi, Wonjun Lee, Hyunju PLoS One Research Article BACKGROUND: Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in Medline articles. Because numerous articles are available in Medline and are written in natural language, text-mining is important. However, a corpus of plant and disease relations is not available yet. Thus, we aimed to construct such a corpus. METHODS AND RESULTS: In this study, we designed and annotated a plant–disease relations corpus, and proposed a computational model to predict plant–disease relations using the corpus. We categorized plant and disease relations into four types: treatments of diseases, causes of diseases, associations, and negative relations. To construct a corpus of plant–disease relations, we first created its annotation guidelines and randomly selected 200 Medline abstracts. From these abstracts, we identified 1,405 and 1,755 plant and disease mentions, annotated to 105 and 237 unique plant and disease identifiers, respectively. When we selected sentences containing at least one plant and one disease mention, we extracted 878 plant and 1,077 disease entities, which finally generated a corpus of plant-disease relations including 1,309 relations from 199 abstracts. To verify the effectiveness of the corpus, we proposed a convolutional neural network model with the shortest dependency path (SDP-CNN) and applied it to the constructed corpus. The micro F-score with ten-fold cross-validation was found to be 0.764. We also applied the proposed SDP-CNN model to all Medline abstracts. When we measured its performance for 483 randomly selected plant-disease co-occurring sentences, the model showed a precision of 0.707. CONCLUSION: The plant–disease relations corpus is unique and represents an important resource for biomedical text-mining. The corpus of plant and disease relations is available at http://gcancer.org/pdr/. Public Library of Science 2019-08-28 /pmc/articles/PMC6713337/ /pubmed/31461491 http://dx.doi.org/10.1371/journal.pone.0221582 Text en © 2019 Kim et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Kim, Baeksoo Choi, Wonjun Lee, Hyunju A corpus of plant–disease relations in the biomedical domain |
title | A corpus of plant–disease relations in the biomedical domain |
title_full | A corpus of plant–disease relations in the biomedical domain |
title_fullStr | A corpus of plant–disease relations in the biomedical domain |
title_full_unstemmed | A corpus of plant–disease relations in the biomedical domain |
title_short | A corpus of plant–disease relations in the biomedical domain |
title_sort | corpus of plant–disease relations in the biomedical domain |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6713337/ https://www.ncbi.nlm.nih.gov/pubmed/31461491 http://dx.doi.org/10.1371/journal.pone.0221582 |
work_keys_str_mv | AT kimbaeksoo acorpusofplantdiseaserelationsinthebiomedicaldomain AT choiwonjun acorpusofplantdiseaserelationsinthebiomedicaldomain AT leehyunju acorpusofplantdiseaserelationsinthebiomedicaldomain AT kimbaeksoo corpusofplantdiseaserelationsinthebiomedicaldomain AT choiwonjun corpusofplantdiseaserelationsinthebiomedicaldomain AT leehyunju corpusofplantdiseaserelationsinthebiomedicaldomain |