Cargando…

A corpus of plant–disease relations in the biomedical domain

BACKGROUND: Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in M...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Baeksoo, Choi, Wonjun, Lee, Hyunju
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6713337/
https://www.ncbi.nlm.nih.gov/pubmed/31461491
http://dx.doi.org/10.1371/journal.pone.0221582
_version_ 1783446855126876160
author Kim, Baeksoo
Choi, Wonjun
Lee, Hyunju
author_facet Kim, Baeksoo
Choi, Wonjun
Lee, Hyunju
author_sort Kim, Baeksoo
collection PubMed
description BACKGROUND: Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in Medline articles. Because numerous articles are available in Medline and are written in natural language, text-mining is important. However, a corpus of plant and disease relations is not available yet. Thus, we aimed to construct such a corpus. METHODS AND RESULTS: In this study, we designed and annotated a plant–disease relations corpus, and proposed a computational model to predict plant–disease relations using the corpus. We categorized plant and disease relations into four types: treatments of diseases, causes of diseases, associations, and negative relations. To construct a corpus of plant–disease relations, we first created its annotation guidelines and randomly selected 200 Medline abstracts. From these abstracts, we identified 1,405 and 1,755 plant and disease mentions, annotated to 105 and 237 unique plant and disease identifiers, respectively. When we selected sentences containing at least one plant and one disease mention, we extracted 878 plant and 1,077 disease entities, which finally generated a corpus of plant-disease relations including 1,309 relations from 199 abstracts. To verify the effectiveness of the corpus, we proposed a convolutional neural network model with the shortest dependency path (SDP-CNN) and applied it to the constructed corpus. The micro F-score with ten-fold cross-validation was found to be 0.764. We also applied the proposed SDP-CNN model to all Medline abstracts. When we measured its performance for 483 randomly selected plant-disease co-occurring sentences, the model showed a precision of 0.707. CONCLUSION: The plant–disease relations corpus is unique and represents an important resource for biomedical text-mining. The corpus of plant and disease relations is available at http://gcancer.org/pdr/.
format Online
Article
Text
id pubmed-6713337
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67133372019-09-04 A corpus of plant–disease relations in the biomedical domain Kim, Baeksoo Choi, Wonjun Lee, Hyunju PLoS One Research Article BACKGROUND: Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in Medline articles. Because numerous articles are available in Medline and are written in natural language, text-mining is important. However, a corpus of plant and disease relations is not available yet. Thus, we aimed to construct such a corpus. METHODS AND RESULTS: In this study, we designed and annotated a plant–disease relations corpus, and proposed a computational model to predict plant–disease relations using the corpus. We categorized plant and disease relations into four types: treatments of diseases, causes of diseases, associations, and negative relations. To construct a corpus of plant–disease relations, we first created its annotation guidelines and randomly selected 200 Medline abstracts. From these abstracts, we identified 1,405 and 1,755 plant and disease mentions, annotated to 105 and 237 unique plant and disease identifiers, respectively. When we selected sentences containing at least one plant and one disease mention, we extracted 878 plant and 1,077 disease entities, which finally generated a corpus of plant-disease relations including 1,309 relations from 199 abstracts. To verify the effectiveness of the corpus, we proposed a convolutional neural network model with the shortest dependency path (SDP-CNN) and applied it to the constructed corpus. The micro F-score with ten-fold cross-validation was found to be 0.764. We also applied the proposed SDP-CNN model to all Medline abstracts. When we measured its performance for 483 randomly selected plant-disease co-occurring sentences, the model showed a precision of 0.707. CONCLUSION: The plant–disease relations corpus is unique and represents an important resource for biomedical text-mining. The corpus of plant and disease relations is available at http://gcancer.org/pdr/. Public Library of Science 2019-08-28 /pmc/articles/PMC6713337/ /pubmed/31461491 http://dx.doi.org/10.1371/journal.pone.0221582 Text en © 2019 Kim et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kim, Baeksoo
Choi, Wonjun
Lee, Hyunju
A corpus of plant–disease relations in the biomedical domain
title A corpus of plant–disease relations in the biomedical domain
title_full A corpus of plant–disease relations in the biomedical domain
title_fullStr A corpus of plant–disease relations in the biomedical domain
title_full_unstemmed A corpus of plant–disease relations in the biomedical domain
title_short A corpus of plant–disease relations in the biomedical domain
title_sort corpus of plant–disease relations in the biomedical domain
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6713337/
https://www.ncbi.nlm.nih.gov/pubmed/31461491
http://dx.doi.org/10.1371/journal.pone.0221582
work_keys_str_mv AT kimbaeksoo acorpusofplantdiseaserelationsinthebiomedicaldomain
AT choiwonjun acorpusofplantdiseaserelationsinthebiomedicaldomain
AT leehyunju acorpusofplantdiseaserelationsinthebiomedicaldomain
AT kimbaeksoo corpusofplantdiseaserelationsinthebiomedicaldomain
AT choiwonjun corpusofplantdiseaserelationsinthebiomedicaldomain
AT leehyunju corpusofplantdiseaserelationsinthebiomedicaldomain