Cargando…
An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records
Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8778877/ https://www.ncbi.nlm.nih.gov/pubmed/35055340 http://dx.doi.org/10.3390/jpm12010025 |
_version_ | 1784637436349906944 |
---|---|
author | Koshman, Varvara Funkner, Anastasia Kovalchuk, Sergey |
author_facet | Koshman, Varvara Funkner, Anastasia Kovalchuk, Sergey |
author_sort | Koshman, Varvara |
collection | PubMed |
description | Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary. |
format | Online Article Text |
id | pubmed-8778877 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-87788772022-01-22 An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records Koshman, Varvara Funkner, Anastasia Kovalchuk, Sergey J Pers Med Article Electronic medical records (EMRs) include many valuable data about patients, which is, however, unstructured. Therefore, there is a lack of both labeled medical text data in Russian and tools for automatic annotation. As a result, today, it is hardly feasible for researchers to utilize text data of EMRs in training machine learning models in the biomedical domain. We present an unsupervised approach to medical data annotation. Syntactic trees are produced from initial sentences using morphological and syntactical analyses. In retrieved trees, similar subtrees are grouped using Node2Vec and Word2Vec and labeled using domain vocabularies and Wikidata categories. The usage of Wikidata categories increased the fraction of labeled sentences 5.5 times compared to labeling with domain vocabularies only. We show on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups. Annotation with domain vocabularies and Wikidata categories covered more than 82% of sentences of the corpus, extended with timestamp and event labels 97% of sentences got covered. The obtained method can be used to label EMRs in Russian automatically. Additionally, the proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabulary. MDPI 2022-01-01 /pmc/articles/PMC8778877/ /pubmed/35055340 http://dx.doi.org/10.3390/jpm12010025 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Koshman, Varvara Funkner, Anastasia Kovalchuk, Sergey An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records |
title | An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records |
title_full | An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records |
title_fullStr | An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records |
title_full_unstemmed | An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records |
title_short | An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records |
title_sort | unsupervised approach to structuring and analyzing repetitive semantic structures in free text of electronic medical records |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8778877/ https://www.ncbi.nlm.nih.gov/pubmed/35055340 http://dx.doi.org/10.3390/jpm12010025 |
work_keys_str_mv | AT koshmanvarvara anunsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords AT funkneranastasia anunsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords AT kovalchuksergey anunsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords AT koshmanvarvara unsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords AT funkneranastasia unsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords AT kovalchuksergey unsupervisedapproachtostructuringandanalyzingrepetitivesemanticstructuresinfreetextofelectronicmedicalrecords |