Cargando…

SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec

In natural-language processing, the subject–action–object (SAO) structure is used to convert unstructured textual data into structured textual data comprising subjects, actions, and objects. This structure is suitable for analyzing the key elements of technology, as well as the relationships between...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sunhye, Park, Inchae, Yoon, Byungun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7001927/
https://www.ncbi.nlm.nih.gov/pubmed/32023289
http://dx.doi.org/10.1371/journal.pone.0227930
_version_ 1783494312614428672
author Kim, Sunhye
Park, Inchae
Yoon, Byungun
author_facet Kim, Sunhye
Park, Inchae
Yoon, Byungun
author_sort Kim, Sunhye
collection PubMed
description In natural-language processing, the subject–action–object (SAO) structure is used to convert unstructured textual data into structured textual data comprising subjects, actions, and objects. This structure is suitable for analyzing the key elements of technology, as well as the relationships between these elements. However, analysis using the existing SAO structure requires a substantial number of manual processes because this structure does not represent the context of the sentences. Thus, we introduce the concept of SAO2Vec, in which SAO is used to embed the vectors of sentences and documents, for use in text mining in the analysis of technical documents. First, the technical documents of interest are collected, and SAO structures are extracted from them. Then, sentence vectors are extracted through the Doc2Vec algorithm and are updated using word vectors in the SAO structure. Finally, SAO vectors are drawn using an updated sentence vector with the same SAO structure. In addition, document vectors are derived from the document’s SAO vectors. The results of an experiment in the Internet of things field indicate that the SAO2Vec method produces 3.1% better accuracy than the Doc2Vec method and 115.0% better accuracy than SAO frequency alone. This proves that the proposed SAO2Vec algorithm can be used to improve grouping and similarity analysis by including both the meanings and the contexts of technical elements.
format Online
Article
Text
id pubmed-7001927
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-70019272020-02-18 SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec Kim, Sunhye Park, Inchae Yoon, Byungun PLoS One Research Article In natural-language processing, the subject–action–object (SAO) structure is used to convert unstructured textual data into structured textual data comprising subjects, actions, and objects. This structure is suitable for analyzing the key elements of technology, as well as the relationships between these elements. However, analysis using the existing SAO structure requires a substantial number of manual processes because this structure does not represent the context of the sentences. Thus, we introduce the concept of SAO2Vec, in which SAO is used to embed the vectors of sentences and documents, for use in text mining in the analysis of technical documents. First, the technical documents of interest are collected, and SAO structures are extracted from them. Then, sentence vectors are extracted through the Doc2Vec algorithm and are updated using word vectors in the SAO structure. Finally, SAO vectors are drawn using an updated sentence vector with the same SAO structure. In addition, document vectors are derived from the document’s SAO vectors. The results of an experiment in the Internet of things field indicate that the SAO2Vec method produces 3.1% better accuracy than the Doc2Vec method and 115.0% better accuracy than SAO frequency alone. This proves that the proposed SAO2Vec algorithm can be used to improve grouping and similarity analysis by including both the meanings and the contexts of technical elements. Public Library of Science 2020-02-05 /pmc/articles/PMC7001927/ /pubmed/32023289 http://dx.doi.org/10.1371/journal.pone.0227930 Text en © 2020 Kim et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kim, Sunhye
Park, Inchae
Yoon, Byungun
SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec
title SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec
title_full SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec
title_fullStr SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec
title_full_unstemmed SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec
title_short SAO2Vec: Development of an algorithm for embedding the subject–action–object (SAO) structure using Doc2Vec
title_sort sao2vec: development of an algorithm for embedding the subject–action–object (sao) structure using doc2vec
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7001927/
https://www.ncbi.nlm.nih.gov/pubmed/32023289
http://dx.doi.org/10.1371/journal.pone.0227930
work_keys_str_mv AT kimsunhye sao2vecdevelopmentofanalgorithmforembeddingthesubjectactionobjectsaostructureusingdoc2vec
AT parkinchae sao2vecdevelopmentofanalgorithmforembeddingthesubjectactionobjectsaostructureusingdoc2vec
AT yoonbyungun sao2vecdevelopmentofanalgorithmforembeddingthesubjectactionobjectsaostructureusingdoc2vec