Cargando…
Support patient search on pathology reports with interactive online learning based data extraction
BACKGROUND: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interes...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Medknow Publications & Media Pvt Ltd
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4629306/ https://www.ncbi.nlm.nih.gov/pubmed/26605116 http://dx.doi.org/10.4103/2153-3539.166012 |
_version_ | 1782398565696405504 |
---|---|
author | Zheng, Shuai Lu, James J. Appin, Christina Brat, Daniel Wang, Fusheng |
author_facet | Zheng, Shuai Lu, James J. Appin, Christina Brat, Daniel Wang, Fusheng |
author_sort | Zheng, Shuai |
collection | PubMed |
description | BACKGROUND: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. METHODS: We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users’ corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. RESULTS: We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. CONCLUSIONS: Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search. |
format | Online Article Text |
id | pubmed-4629306 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Medknow Publications & Media Pvt Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-46293062015-11-24 Support patient search on pathology reports with interactive online learning based data extraction Zheng, Shuai Lu, James J. Appin, Christina Brat, Daniel Wang, Fusheng J Pathol Inform Research Article BACKGROUND: Structural reporting enables semantic understanding and prompt retrieval of clinical findings about patients. While synoptic pathology reporting provides templates for data entries, information in pathology reports remains primarily in narrative free text form. Extracting data of interest from narrative pathology reports could significantly improve the representation of the information and enable complex structured queries. However, manual extraction is tedious and error-prone, and automated tools are often constructed with a fixed training dataset and not easily adaptable. Our goal is to extract data from pathology reports to support advanced patient search with a highly adaptable semi-automated data extraction system, which can adjust and self-improve by learning from a user's interaction with minimal human effort. METHODS: We have developed an online machine learning based information extraction system called IDEAL-X. With its graphical user interface, the system's data extraction engine automatically annotates values for users to review upon loading each report text. The system analyzes users’ corrections regarding these annotations with online machine learning, and incrementally enhances and refines the learning model as reports are processed. The system also takes advantage of customized controlled vocabularies, which can be adaptively refined during the online learning process to further assist the data extraction. As the accuracy of automatic annotation improves overtime, the effort of human annotation is gradually reduced. After all reports are processed, a built-in query engine can be applied to conveniently define queries based on extracted structured data. RESULTS: We have evaluated the system with a dataset of anatomic pathology reports from 50 patients. Extracted data elements include demographical data, diagnosis, genetic marker, and procedure. The system achieves F-1 scores of around 95% for the majority of tests. CONCLUSIONS: Extracting data from pathology reports could enable more accurate knowledge to support biomedical research and clinical diagnosis. IDEAL-X provides a bridge that takes advantage of online machine learning based data extraction and the knowledge from human's feedback. By combining iterative online learning and adaptive controlled vocabularies, IDEAL-X can deliver highly adaptive and accurate data extraction to support patient search. Medknow Publications & Media Pvt Ltd 2015-09-28 /pmc/articles/PMC4629306/ /pubmed/26605116 http://dx.doi.org/10.4103/2153-3539.166012 Text en Copyright: © 2015 Journal of Pathology Informatics http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as the author is credited and the new creations are licensed under the identical terms. |
spellingShingle | Research Article Zheng, Shuai Lu, James J. Appin, Christina Brat, Daniel Wang, Fusheng Support patient search on pathology reports with interactive online learning based data extraction |
title | Support patient search on pathology reports with interactive online learning based data extraction |
title_full | Support patient search on pathology reports with interactive online learning based data extraction |
title_fullStr | Support patient search on pathology reports with interactive online learning based data extraction |
title_full_unstemmed | Support patient search on pathology reports with interactive online learning based data extraction |
title_short | Support patient search on pathology reports with interactive online learning based data extraction |
title_sort | support patient search on pathology reports with interactive online learning based data extraction |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4629306/ https://www.ncbi.nlm.nih.gov/pubmed/26605116 http://dx.doi.org/10.4103/2153-3539.166012 |
work_keys_str_mv | AT zhengshuai supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction AT lujamesj supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction AT appinchristina supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction AT bratdaniel supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction AT wangfusheng supportpatientsearchonpathologyreportswithinteractiveonlinelearningbaseddataextraction |