Cargando…

OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles

[Image: see text] There is a pressing need for the automated extraction of chemical reaction information because of the rapid growth of scientific documents. The previously reported works in the literature for the procedure extraction either (a) did not consider the semantic relations between the ac...

Descripción completa

Detalles Bibliográficos
Autores principales: Machi, Kojiro, Akiyama, Seiji, Nagata, Yuuya, Yoshioka, Masaharu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10647022/
https://www.ncbi.nlm.nih.gov/pubmed/37859303
http://dx.doi.org/10.1021/acs.jcim.3c01449
_version_ 1785147483019542528
author Machi, Kojiro
Akiyama, Seiji
Nagata, Yuuya
Yoshioka, Masaharu
author_facet Machi, Kojiro
Akiyama, Seiji
Nagata, Yuuya
Yoshioka, Masaharu
author_sort Machi, Kojiro
collection PubMed
description [Image: see text] There is a pressing need for the automated extraction of chemical reaction information because of the rapid growth of scientific documents. The previously reported works in the literature for the procedure extraction either (a) did not consider the semantic relations between the action and argument or (b) defined a detailed schema for the extraction. The former method was insufficient for reproducing the reaction, while the latter methods were too specific to their own schema and did not consider the general semantic relation between the verb and argument. In addition, they did not provide an annotated text that aligned with the structured procedure. Along these lines, in this work, we propose a corpus named organic synthesis procedures with argument roles (OSPAR) that is annotated with rolesets to consider the semantic relation between the verb and argument. We also provide rolesets for chemical reactions, especially for organic synthesis, which represent the argument roles of actions in the corpus. More specifically, we annotated 112 organic synthesis procedures in journal articles from Organic Syntheses and defined 19 new rolesets in addition to 29 rolesets from an existing language resource (Proposition Bank). After that, we constructed a simple deep learning system trained on OSPAR and discussed the usefulness of the corpus by comparing it with chemical description language (XDL) generated by a natural language processing tool, namely, SynthReader. While our system’s output required more detailed parsing, it covered comparable information against XDL. Moreover, we confirmed that the validation of the output action sequence was easy as it was aligned with the original text.
format Online
Article
Text
id pubmed-10647022
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-106470222023-11-15 OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles Machi, Kojiro Akiyama, Seiji Nagata, Yuuya Yoshioka, Masaharu J Chem Inf Model [Image: see text] There is a pressing need for the automated extraction of chemical reaction information because of the rapid growth of scientific documents. The previously reported works in the literature for the procedure extraction either (a) did not consider the semantic relations between the action and argument or (b) defined a detailed schema for the extraction. The former method was insufficient for reproducing the reaction, while the latter methods were too specific to their own schema and did not consider the general semantic relation between the verb and argument. In addition, they did not provide an annotated text that aligned with the structured procedure. Along these lines, in this work, we propose a corpus named organic synthesis procedures with argument roles (OSPAR) that is annotated with rolesets to consider the semantic relation between the verb and argument. We also provide rolesets for chemical reactions, especially for organic synthesis, which represent the argument roles of actions in the corpus. More specifically, we annotated 112 organic synthesis procedures in journal articles from Organic Syntheses and defined 19 new rolesets in addition to 29 rolesets from an existing language resource (Proposition Bank). After that, we constructed a simple deep learning system trained on OSPAR and discussed the usefulness of the corpus by comparing it with chemical description language (XDL) generated by a natural language processing tool, namely, SynthReader. While our system’s output required more detailed parsing, it covered comparable information against XDL. Moreover, we confirmed that the validation of the output action sequence was easy as it was aligned with the original text. American Chemical Society 2023-10-20 /pmc/articles/PMC10647022/ /pubmed/37859303 http://dx.doi.org/10.1021/acs.jcim.3c01449 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by/4.0/Permits the broadest form of re-use including for commercial purposes, provided that author attribution and integrity are maintained (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Machi, Kojiro
Akiyama, Seiji
Nagata, Yuuya
Yoshioka, Masaharu
OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles
title OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles
title_full OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles
title_fullStr OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles
title_full_unstemmed OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles
title_short OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles
title_sort ospar: a corpus for extraction of organic synthesis procedures with argument roles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10647022/
https://www.ncbi.nlm.nih.gov/pubmed/37859303
http://dx.doi.org/10.1021/acs.jcim.3c01449
work_keys_str_mv AT machikojiro osparacorpusforextractionoforganicsynthesisprocedureswithargumentroles
AT akiyamaseiji osparacorpusforextractionoforganicsynthesisprocedureswithargumentroles
AT nagatayuuya osparacorpusforextractionoforganicsynthesisprocedureswithargumentroles
AT yoshiokamasaharu osparacorpusforextractionoforganicsynthesisprocedureswithargumentroles