Cargando…

Support patient search on pathology reports with interactive online learning based data extraction

BACKGROUND: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interes...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Shuai, Lu, James J., Appin, Christina, Brat, Daniel, Wang, Fusheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications & Media Pvt Ltd 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4629306/
https://www.ncbi.nlm.nih.gov/pubmed/26605116
http://dx.doi.org/10.4103/2153-3539.166012
_version_ 1782398565696405504
author Zheng, Shuai
Lu, James J.
Appin, Christina
Brat, Daniel
Wang, Fusheng
author_facet Zheng, Shuai
Lu, James J.
Appin, Christina
Brat, Daniel
Wang, Fusheng
author_sort Zheng, Shuai
collection PubMed
description BACKGROUND: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. METHODS: We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users’ corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. RESULTS: We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. CONCLUSIONS: Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search.
format Online
Article
Text
id pubmed-4629306
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Medknow Publications & Media Pvt Ltd
record_format MEDLINE/PubMed
spelling pubmed-46293062015-11-24 Support patient search on pathology reports with interactive online learning based data extraction Zheng, Shuai Lu, James J. Appin, Christina Brat, Daniel Wang, Fusheng J Pathol Inform Research Article BACKGROUND: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. METHODS: We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users’ corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. RESULTS: We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. CONCLUSIONS: Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search. Medknow Publications & Media Pvt Ltd 2015-09-28 /pmc/articles/PMC4629306/ /pubmed/26605116 http://dx.doi.org/10.4103/2153-3539.166012 Text en Copyright: © 2015 Journal of Pathology Informatics http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms.
spellingShingle Research Article
Zheng, Shuai
Lu, James J.
Appin, Christina
Brat, Daniel
Wang, Fusheng
Support patient search on pathology reports with interactive online learning based data extraction
title Support patient search on pathology reports with interactive online learning based data extraction
title_full Support patient search on pathology reports with interactive online learning based data extraction
title_fullStr Support patient search on pathology reports with interactive online learning based data extraction
title_full_unstemmed Support patient search on pathology reports with interactive online learning based data extraction
title_short Support patient search on pathology reports with interactive online learning based data extraction
title_sort support patient search on pathology reports with interactive online learning based data extraction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4629306/
https://www.ncbi.nlm.nih.gov/pubmed/26605116
http://dx.doi.org/10.4103/2153-3539.166012
work_keys_str_mv AT zhengshuai supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction
AT lujamesj supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction
AT appinchristina supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction
AT bratdaniel supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction
AT wangfusheng supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction