Cargando…
OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles
[Image: see text] There is a pressing need for the automated extraction of chemical reaction information because of the rapid growth of scientific documents. The previously reported works in the literature for the procedure extraction either (a) did not consider the semantic relations between the ac...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10647022/ https://www.ncbi.nlm.nih.gov/pubmed/37859303 http://dx.doi.org/10.1021/acs.jcim.3c01449 |
_version_ | 1785147483019542528 |
---|---|
author | Machi, Kojiro Akiyama, Seiji Nagata, Yuuya Yoshioka, Masaharu |
author_facet | Machi, Kojiro Akiyama, Seiji Nagata, Yuuya Yoshioka, Masaharu |
author_sort | Machi, Kojiro |
collection | PubMed |
description | [Image: see text] There is a pressing need for the automated extraction of chemical reaction information because of the rapid growth of scientific documents. The previously reported works in the literature for the procedure extraction either (a) did not consider the semantic relations between the action and argument or (b) defined a detailed schema for the extraction. The former method was insufficient for reproducing the reaction, while the latter methods were too specific to their own schema and did not consider the general semantic relation between the verb and argument. In addition, they did not provide an annotated text that aligned with the structured procedure. Along these lines, in this work, we propose a corpus named organic synthesis procedures with argument roles (OSPAR) that is annotated with rolesets to consider the semantic relation between the verb and argument. We also provide rolesets for chemical reactions, especially for organic synthesis, which represent the argument roles of actions in the corpus. More specifically, we annotated 112 organic synthesis procedures in journal articles from Organic Syntheses and defined 19 new rolesets in addition to 29 rolesets from an existing language resource (Proposition Bank). After that, we constructed a simple deep learning system trained on OSPAR and discussed the usefulness of the corpus by comparing it with chemical description language (XDL) generated by a natural language processing tool, namely, SynthReader. While our system’s output required more detailed parsing, it covered comparable information against XDL. Moreover, we confirmed that the validation of the output action sequence was easy as it was aligned with the original text. |
format | Online Article Text |
id | pubmed-10647022 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-106470222023-11-15 OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles Machi, Kojiro Akiyama, Seiji Nagata, Yuuya Yoshioka, Masaharu J Chem Inf Model [Image: see text] There is a pressing need for the automated extraction of chemical reaction information because of the rapid growth of scientific documents. The previously reported works in the literature for the procedure extraction either (a) did not consider the semantic relations between the action and argument or (b) defined a detailed schema for the extraction. The former method was insufficient for reproducing the reaction, while the latter methods were too specific to their own schema and did not consider the general semantic relation between the verb and argument. In addition, they did not provide an annotated text that aligned with the structured procedure. Along these lines, in this work, we propose a corpus named organic synthesis procedures with argument roles (OSPAR) that is annotated with rolesets to consider the semantic relation between the verb and argument. We also provide rolesets for chemical reactions, especially for organic synthesis, which represent the argument roles of actions in the corpus. More specifically, we annotated 112 organic synthesis procedures in journal articles from Organic Syntheses and defined 19 new rolesets in addition to 29 rolesets from an existing language resource (Proposition Bank). After that, we constructed a simple deep learning system trained on OSPAR and discussed the usefulness of the corpus by comparing it with chemical description language (XDL) generated by a natural language processing tool, namely, SynthReader. While our system’s output required more detailed parsing, it covered comparable information against XDL. Moreover, we confirmed that the validation of the output action sequence was easy as it was aligned with the original text. American Chemical Society 2023-10-20 /pmc/articles/PMC10647022/ /pubmed/37859303 http://dx.doi.org/10.1021/acs.jcim.3c01449 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Machi, Kojiro Akiyama, Seiji Nagata, Yuuya Yoshioka, Masaharu OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles |
title | OSPAR: A Corpus
for Extraction of Organic Synthesis
Procedures with Argument Roles |
title_full | OSPAR: A Corpus
for Extraction of Organic Synthesis
Procedures with Argument Roles |
title_fullStr | OSPAR: A Corpus
for Extraction of Organic Synthesis
Procedures with Argument Roles |
title_full_unstemmed | OSPAR: A Corpus
for Extraction of Organic Synthesis
Procedures with Argument Roles |
title_short | OSPAR: A Corpus
for Extraction of Organic Synthesis
Procedures with Argument Roles |
title_sort | ospar: a corpus
for extraction of organic synthesis
procedures with argument roles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10647022/ https://www.ncbi.nlm.nih.gov/pubmed/37859303 http://dx.doi.org/10.1021/acs.jcim.3c01449 |
work_keys_str_mv | AT machikojiro osparacorpusforextractionoforganicsynthesisprocedureswithargumentroles AT akiyamaseiji osparacorpusforextractionoforganicsynthesisprocedureswithargumentroles AT nagatayuuya osparacorpusforextractionoforganicsynthesisprocedureswithargumentroles AT yoshiokamasaharu osparacorpusforextractionoforganicsynthesisprocedureswithargumentroles |