Cargando…

An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records

Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of...

Descripción completa

Detalles Bibliográficos
Autores principales: Koshman, Varvara, Funkner, Anastasia, Kovalchuk, Sergey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8778877/
https://www.ncbi.nlm.nih.gov/pubmed/35055340
http://dx.doi.org/10.3390/jpm12010025
_version_ 1784637436349906944
author Koshman, Varvara
Funkner, Anastasia
Kovalchuk, Sergey
author_facet Koshman, Varvara
Funkner, Anastasia
Kovalchuk, Sergey
author_sort Koshman, Varvara
collection PubMed
description Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary.
format Online
Article
Text
id pubmed-8778877
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-87788772022-01-22 An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records Koshman, Varvara Funkner, Anastasia Kovalchuk, Sergey J Pers Med Article Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary. MDPI 2022-01-01 /pmc/articles/PMC8778877/ /pubmed/35055340 http://dx.doi.org/10.3390/jpm12010025 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Koshman, Varvara
Funkner, Anastasia
Kovalchuk, Sergey
An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
title An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
title_full An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
title_fullStr An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
title_full_unstemmed An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
title_short An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
title_sort unsupervised approach to structuring and analyzing repetitive semantic structures in free text of electronic medical records
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8778877/
https://www.ncbi.nlm.nih.gov/pubmed/35055340
http://dx.doi.org/10.3390/jpm12010025
work_keys_str_mv AT koshmanvarvara anunsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords
AT funkneranastasia anunsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords
AT kovalchuksergey anunsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords
AT koshmanvarvara unsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords
AT funkneranastasia unsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords
AT kovalchuksergey unsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords