Cargando…
A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
Despite the rapid global movement towards electronic health records, clinical letters written in unstructured natural languages are still the preferred form of inter-practitioner communication about patients. These letters, when archived over a long period of time, provide invaluable longitudinal cl...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674942/ https://www.ncbi.nlm.nih.gov/pubmed/26664724 http://dx.doi.org/10.1186/s13755-015-0013-y |
_version_ | 1782404979712065536 |
---|---|
author | Liu, Wei Chung, Bo Chuen Wang, Rui Ng, Jonathon Morlet, Nigel |
author_facet | Liu, Wei Chung, Bo Chuen Wang, Rui Ng, Jonathon Morlet, Nigel |
author_sort | Liu, Wei |
collection | PubMed |
description | Despite the rapid global movement towards electronic health records, clinical letters written in unstructured natural languages are still the preferred form of inter-practitioner communication about patients. These letters, when archived over a long period of time, provide invaluable longitudinal clinical details on individual and populations of patients. In this paper we present three unsupervised approaches, sequential pattern mining (PrefixSpan); frequency linguistic based C-Value; and keyphrase extraction from co-occurrence graphs (TextRank), to automatically extract single and multi-word medical terms without domain-specific knowledge. Because each of the three approaches focuses on different aspects of the language feature space, we propose a genetic algorithm to learn the best parameters of linearly integrating the three extractors for optimal performance against domain expert annotations. Around 30,000 clinical letters sent over the past decade from ophthalmology specialists to general practitioners at an eye clinic are anonymised as the corpus to evaluate the effectiveness of the ensemble against individual extractors. With minimal annotation, the ensemble achieves an average F-measure of 65.65 % when considering only complex medical terms, and a F-measure of 72.47 % if we take single word terms (i.e. unigrams) into consideration, markedly better than the three term extraction techniques when used alone. |
format | Online Article Text |
id | pubmed-4674942 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46749422015-12-11 A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters Liu, Wei Chung, Bo Chuen Wang, Rui Ng, Jonathon Morlet, Nigel Health Inf Sci Syst Research Despite the rapid global movement towards electronic health records, clinical letters written in unstructured natural languages are still the preferred form of inter-practitioner communication about patients. These letters, when archived over a long period of time, provide invaluable longitudinal clinical details on individual and populations of patients. In this paper we present three unsupervised approaches, sequential pattern mining (PrefixSpan); frequency linguistic based C-Value; and keyphrase extraction from co-occurrence graphs (TextRank), to automatically extract single and multi-word medical terms without domain-specific knowledge. Because each of the three approaches focuses on different aspects of the language feature space, we propose a genetic algorithm to learn the best parameters of linearly integrating the three extractors for optimal performance against domain expert annotations. Around 30,000 clinical letters sent over the past decade from ophthalmology specialists to general practitioners at an eye clinic are anonymised as the corpus to evaluate the effectiveness of the ensemble against individual extractors. With minimal annotation, the ensemble achieves an average F-measure of 65.65 % when considering only complex medical terms, and a F-measure of 72.47 % if we take single word terms (i.e. unigrams) into consideration, markedly better than the three term extraction techniques when used alone. BioMed Central 2015-12-09 /pmc/articles/PMC4674942/ /pubmed/26664724 http://dx.doi.org/10.1186/s13755-015-0013-y Text en © Liu et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Liu, Wei Chung, Bo Chuen Wang, Rui Ng, Jonathon Morlet, Nigel A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters |
title | A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters |
title_full | A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters |
title_fullStr | A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters |
title_full_unstemmed | A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters |
title_short | A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters |
title_sort | genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674942/ https://www.ncbi.nlm.nih.gov/pubmed/26664724 http://dx.doi.org/10.1186/s13755-015-0013-y |
work_keys_str_mv | AT liuwei ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT chungbochuen ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT wangrui ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT ngjonathon ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT morletnigel ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT liuwei geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT chungbochuen geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT wangrui geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT ngjonathon geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters AT morletnigel geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters |