Cargando…

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

BACKGROUND: Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zheng, Shuai, Lu, James J, Ghasemzadeh, Nima, Hayek, Salim S, Quyyumi, Arshed A, Wang, Fusheng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2017
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5442348/ https://www.ncbi.nlm.nih.gov/pubmed/28487265 http://dx.doi.org/10.2196/medinform.7235

_version_	1783238392507531264
author	Zheng, Shuai Lu, James J Ghasemzadeh, Nima Hayek, Salim S Quyyumi, Arshed A Wang, Fusheng
author_facet	Zheng, Shuai Lu, James J Ghasemzadeh, Nima Hayek, Salim S Quyyumi, Arshed A Wang, Fusheng
author_sort	Zheng, Shuai
collection	PubMed
description	BACKGROUND: Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. OBJECTIVE: Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. METHODS: A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. RESULTS: Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. CONCLUSIONS: IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.
format	Online Article Text
id	pubmed-5442348
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-54423482017-06-06 Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies Zheng, Shuai Lu, James J Ghasemzadeh, Nima Hayek, Salim S Quyyumi, Arshed A Wang, Fusheng JMIR Med Inform Original Paper BACKGROUND: Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. OBJECTIVE: Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. METHODS: A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. RESULTS: Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports—each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. CONCLUSIONS: IDEAL-X adopts a unique online machine learning–based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable. JMIR Publications 2017-05-09 /pmc/articles/PMC5442348/ /pubmed/28487265 http://dx.doi.org/10.2196/medinform.7235 Text en ©Shuai Zheng, James J Lu, Nima Ghasemzadeh, Salim S Hayek, Arshed A Quyyumi, Fusheng Wang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 09.05.2017. http://creativecommons.org/licenses/by/2.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Zheng, Shuai Lu, James J Ghasemzadeh, Nima Hayek, Salim S Quyyumi, Arshed A Wang, Fusheng Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies
title	Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies
title_full	Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies
title_fullStr	Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies
title_full_unstemmed	Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies
title_short	Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies
title_sort	effective information extraction framework for heterogeneous clinical reports using online machine learning and controlled vocabularies
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5442348/ https://www.ncbi.nlm.nih.gov/pubmed/28487265 http://dx.doi.org/10.2196/medinform.7235
work_keys_str_mv	AT zhengshuai effectiveinformationextractionframeworkforheterogeneousclinicalreportsusingonlinemachinelearningandcontrolledvocabularies AT lujamesj effectiveinformationextractionframeworkforheterogeneousclinicalreportsusingonlinemachinelearningandcontrolledvocabularies AT ghasemzadehnima effectiveinformationextractionframeworkforheterogeneousclinicalreportsusingonlinemachinelearningandcontrolledvocabularies AT hayeksalims effectiveinformationextractionframeworkforheterogeneousclinicalreportsusingonlinemachinelearningandcontrolledvocabularies AT quyyumiarsheda effectiveinformationextractionframeworkforheterogeneousclinicalreportsusingonlinemachinelearningandcontrolledvocabularies AT wangfusheng effectiveinformationextractionframeworkforheterogeneousclinicalreportsusingonlinemachinelearningandcontrolledvocabularies

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

Ejemplares similares