Cargando…

A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters

Despite the rapid global movement towards electronic health records, clinical letters written in unstructured natural languages are still the preferred form of inter-practitioner communication about patients. These letters, when archived over a long period of time, provide invaluable longitudinal cl...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Wei, Chung, Bo Chuen, Wang, Rui, Ng, Jonathon, Morlet, Nigel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674942/
https://www.ncbi.nlm.nih.gov/pubmed/26664724
http://dx.doi.org/10.1186/s13755-015-0013-y
_version_ 1782404979712065536
author Liu, Wei
Chung, Bo Chuen
Wang, Rui
Ng, Jonathon
Morlet, Nigel
author_facet Liu, Wei
Chung, Bo Chuen
Wang, Rui
Ng, Jonathon
Morlet, Nigel
author_sort Liu, Wei
collection PubMed
description Despite the rapid global movement towards electronic health records, clinical letters written in unstructured natural languages are still the preferred form of inter-practitioner communication about patients. These letters, when archived over a long period of time, provide invaluable longitudinal clinical details on individual and populations of patients. In this paper we present three unsupervised approaches, sequential pattern mining (PrefixSpan); frequency linguistic based C-Value; and keyphrase extraction from co-occurrence graphs (TextRank), to automatically extract single and multi-word medical terms without domain-specific knowledge. Because each of the three approaches focuses on different aspects of the language feature space, we propose a genetic algorithm to learn the best parameters of linearly integrating the three extractors for optimal performance against domain expert annotations. Around 30,000 clinical letters sent over the past decade from ophthalmology specialists to general practitioners at an eye clinic are anonymised as the corpus to evaluate the effectiveness of the ensemble against individual extractors. With minimal annotation, the ensemble achieves an average F-measure of 65.65 % when considering only complex medical terms, and a F-measure of 72.47 % if we take single word terms (i.e. unigrams) into consideration, markedly better than the three term extraction techniques when used alone.
format Online
Article
Text
id pubmed-4674942
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46749422015-12-11 A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters Liu, Wei Chung, Bo Chuen Wang, Rui Ng, Jonathon Morlet, Nigel Health Inf Sci Syst Research Despite the rapid global movement towards electronic health records, clinical letters written in unstructured natural languages are still the preferred form of inter-practitioner communication about patients. These letters, when archived over a long period of time, provide invaluable longitudinal clinical details on individual and populations of patients. In this paper we present three unsupervised approaches, sequential pattern mining (PrefixSpan); frequency linguistic based C-Value; and keyphrase extraction from co-occurrence graphs (TextRank), to automatically extract single and multi-word medical terms without domain-specific knowledge. Because each of the three approaches focuses on different aspects of the language feature space, we propose a genetic algorithm to learn the best parameters of linearly integrating the three extractors for optimal performance against domain expert annotations. Around 30,000 clinical letters sent over the past decade from ophthalmology specialists to general practitioners at an eye clinic are anonymised as the corpus to evaluate the effectiveness of the ensemble against individual extractors. With minimal annotation, the ensemble achieves an average F-measure of 65.65 % when considering only complex medical terms, and a F-measure of 72.47 % if we take single word terms (i.e. unigrams) into consideration, markedly better than the three term extraction techniques when used alone. BioMed Central 2015-12-09 /pmc/articles/PMC4674942/ /pubmed/26664724 http://dx.doi.org/10.1186/s13755-015-0013-y Text en © Liu et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Liu, Wei
Chung, Bo Chuen
Wang, Rui
Ng, Jonathon
Morlet, Nigel
A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
title A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
title_full A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
title_fullStr A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
title_full_unstemmed A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
title_short A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
title_sort genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674942/
https://www.ncbi.nlm.nih.gov/pubmed/26664724
http://dx.doi.org/10.1186/s13755-015-0013-y
work_keys_str_mv AT liuwei ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT chungbochuen ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT wangrui ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT ngjonathon ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT morletnigel ageneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT liuwei geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT chungbochuen geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT wangrui geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT ngjonathon geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters
AT morletnigel geneticalgorithmenabledensembleforunsupervisedmedicaltermextractionfromclinicalletters