Cargando…

Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian

To obtain accurate predictive models in medicine, it is necessary to use complete relevant information about the patient. We propose an approach for extracting temporary expressions from unlabeled natural language texts. This approach can be used for the first analysis of the corpus, for data labeli...

Descripción completa

Detalles Bibliográficos
Autores principales: Funkner, Anastasia A., Kovalchuk, Sergey V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303688/
http://dx.doi.org/10.1007/978-3-030-50423-6_44
_version_ 1783548113579933696
author Funkner, Anastasia A.
Kovalchuk, Sergey V.
author_facet Funkner, Anastasia A.
Kovalchuk, Sergey V.
author_sort Funkner, Anastasia A.
collection PubMed
description To obtain accurate predictive models in medicine, it is necessary to use complete relevant information about the patient. We propose an approach for extracting temporary expressions from unlabeled natural language texts. This approach can be used for the first analysis of the corpus, for data labeling as the first stage, or for obtaining linguistic constructions that can be used for a rule-based approach to retrieve information. Our method includes the sequential use of several machine learning and natural language processing methods: classification of sentences, the transformation of word bag frequencies, clustering of sentences with time expressions, classification of new data into clusters and construction of sentence profiles using feature importances. With this method, we derive the list of the most frequent time expressions and extract events and/or time events for 9801 sentences of anamnesis in Russian. The proposed approach is independent of the corpus language and can be used for other tasks, for example, extracting an experiencer of a disease.
format Online
Article
Text
id pubmed-7303688
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73036882020-06-19 Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian Funkner, Anastasia A. Kovalchuk, Sergey V. Computational Science – ICCS 2020 Article To obtain accurate predictive models in medicine, it is necessary to use complete relevant information about the patient. We propose an approach for extracting temporary expressions from unlabeled natural language texts. This approach can be used for the first analysis of the corpus, for data labeling as the first stage, or for obtaining linguistic constructions that can be used for a rule-based approach to retrieve information. Our method includes the sequential use of several machine learning and natural language processing methods: classification of sentences, the transformation of word bag frequencies, clustering of sentences with time expressions, classification of new data into clusters and construction of sentence profiles using feature importances. With this method, we derive the list of the most frequent time expressions and extract events and/or time events for 9801 sentences of anamnesis in Russian. The proposed approach is independent of the corpus language and can be used for other tasks, for example, extracting an experiencer of a disease. 2020-05-23 /pmc/articles/PMC7303688/ http://dx.doi.org/10.1007/978-3-030-50423-6_44 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Funkner, Anastasia A.
Kovalchuk, Sergey V.
Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian
title Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian
title_full Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian
title_fullStr Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian
title_full_unstemmed Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian
title_short Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian
title_sort time expressions identification without human-labeled corpus for clinical text mining in russian
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303688/
http://dx.doi.org/10.1007/978-3-030-50423-6_44
work_keys_str_mv AT funkneranastasiaa timeexpressionsidentificationwithouthumanlabeledcorpusforclinicaltextmininginrussian
AT kovalchuksergeyv timeexpressionsidentificationwithouthumanlabeledcorpusforclinicaltextmininginrussian