Cargando…
Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian
To obtain accurate predictive models in medicine, it is necessary to use complete relevant information about the patient. We propose an approach for extracting temporary expressions from unlabeled natural language texts. This approach can be used for the first analysis of the corpus, for data labeli...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303688/ http://dx.doi.org/10.1007/978-3-030-50423-6_44 |
_version_ | 1783548113579933696 |
---|---|
author | Funkner, Anastasia A. Kovalchuk, Sergey V. |
author_facet | Funkner, Anastasia A. Kovalchuk, Sergey V. |
author_sort | Funkner, Anastasia A. |
collection | PubMed |
description | To obtain accurate predictive models in medicine, it is necessary to use complete relevant information about the patient. We propose an approach for extracting temporary expressions from unlabeled natural language texts. This approach can be used for the first analysis of the corpus, for data labeling as the first stage, or for obtaining linguistic constructions that can be used for a rule-based approach to retrieve information. Our method includes the sequential use of several machine learning and natural language processing methods: classification of sentences, the transformation of word bag frequencies, clustering of sentences with time expressions, classification of new data into clusters and construction of sentence profiles using feature importances. With this method, we derive the list of the most frequent time expressions and extract events and/or time events for 9801 sentences of anamnesis in Russian. The proposed approach is independent of the corpus language and can be used for other tasks, for example, extracting an experiencer of a disease. |
format | Online Article Text |
id | pubmed-7303688 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73036882020-06-19 Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian Funkner, Anastasia A. Kovalchuk, Sergey V. Computational Science – ICCS 2020 Article To obtain accurate predictive models in medicine, it is necessary to use complete relevant information about the patient. We propose an approach for extracting temporary expressions from unlabeled natural language texts. This approach can be used for the first analysis of the corpus, for data labeling as the first stage, or for obtaining linguistic constructions that can be used for a rule-based approach to retrieve information. Our method includes the sequential use of several machine learning and natural language processing methods: classification of sentences, the transformation of word bag frequencies, clustering of sentences with time expressions, classification of new data into clusters and construction of sentence profiles using feature importances. With this method, we derive the list of the most frequent time expressions and extract events and/or time events for 9801 sentences of anamnesis in Russian. The proposed approach is independent of the corpus language and can be used for other tasks, for example, extracting an experiencer of a disease. 2020-05-23 /pmc/articles/PMC7303688/ http://dx.doi.org/10.1007/978-3-030-50423-6_44 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Funkner, Anastasia A. Kovalchuk, Sergey V. Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian |
title | Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian |
title_full | Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian |
title_fullStr | Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian |
title_full_unstemmed | Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian |
title_short | Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian |
title_sort | time expressions identification without human-labeled corpus for clinical text mining in russian |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303688/ http://dx.doi.org/10.1007/978-3-030-50423-6_44 |
work_keys_str_mv | AT funkneranastasiaa timeexpressionsidentificationwithouthumanlabeledcorpusforclinicaltextmininginrussian AT kovalchuksergeyv timeexpressionsidentificationwithouthumanlabeledcorpusforclinicaltextmininginrussian |