Cargando…
Time Expressions Identification Without Human-Labeled Corpus for Clinical Text Mining in Russian
To obtain accurate predictive models in medicine, it is necessary to use complete relevant information about the patient. We propose an approach for extracting temporary expressions from unlabeled natural language texts. This approach can be used for the first analysis of the corpus, for data labeli...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7303688/ http://dx.doi.org/10.1007/978-3-030-50423-6_44 |
Sumario: | To obtain accurate predictive models in medicine, it is necessary to use complete relevant information about the patient. We propose an approach for extracting temporary expressions from unlabeled natural language texts. This approach can be used for the first analysis of the corpus, for data labeling as the first stage, or for obtaining linguistic constructions that can be used for a rule-based approach to retrieve information. Our method includes the sequential use of several machine learning and natural language processing methods: classification of sentences, the transformation of word bag frequencies, clustering of sentences with time expressions, classification of new data into clusters and construction of sentence profiles using feature importances. With this method, we derive the list of the most frequent time expressions and extract events and/or time events for 9801 sentences of anamnesis in Russian. The proposed approach is independent of the corpus language and can be used for other tasks, for example, extracting an experiencer of a disease. |
---|