Cargando…

A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system

BACKGROUND: Healthcare providers generate a huge amount of biomedical data stored in either legacy system (paper-based) format or electronic medical records (EMR) around the world, which are collectively referred to as big biomedical data (BBD). To realize the promise of BBD for clinical use and res...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Ligang, Li, Liping, Hu, Jiajia, Wang, Xiaozhe, Hou, Boulin, Zhang, Tianze, Zhao, Lue Ping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006527/
https://www.ncbi.nlm.nih.gov/pubmed/27577240
http://dx.doi.org/10.1186/s12911-016-0357-5
_version_ 1782451080466006016
author Luo, Ligang
Li, Liping
Hu, Jiajia
Wang, Xiaozhe
Hou, Boulin
Zhang, Tianze
Zhao, Lue Ping
author_facet Luo, Ligang
Li, Liping
Hu, Jiajia
Wang, Xiaozhe
Hou, Boulin
Zhang, Tianze
Zhao, Lue Ping
author_sort Luo, Ligang
collection PubMed
description BACKGROUND: Healthcare providers generate a huge amount of biomedical data stored in either legacy system (paper-based) format or electronic medical records (EMR) around the world, which are collectively referred to as big biomedical data (BBD). To realize the promise of BBD for clinical use and research, it is an essential step to extract key data elements from unstructured medical records into patient-centered electronic health records with computable data elements. Our objective is to introduce a novel solution, known as a double-reading/entry system (DRESS), for extracting clinical data from unstructured medical records (MR) and creating a semi-structured electronic health record database, as well as to demonstrate its reproducibility empirically. METHODS: Utilizing the modern cloud-based technologies, we have developed a comprehensive system that includes multiple subsystems, from capturing MRs in clinics, to securely transferring MRs, storing and managing cloud-based MRs, to facilitating both machine learning and manual reading, and to performing iterative quality control before committing the semi-structured data into the desired database. To evaluate the reproducibility of extracted medical data elements by DRESS, we conduct a blinded reproducibility study, with 100 MRs from patients who have undergone surgical treatment of lung cancer in China. The study uses Kappa statistic to measure concordance of discrete variables, and uses correlation coefficient to measure reproducibility of continuous variables. RESULTS: Using the DRESS, we have demonstrated the feasibility of extracting clinical data from unstructured MRs to create semi-structured and patient-centered electronic health record database. The reproducibility study with 100 patient’s MRs has shown an overall high reproducibility of 98 %, and varies across six modules (pathology, Radio/chemo therapy, clinical examination, surgery information, medical image and general patient information). CONCLUSIONS: DRESS uses a double-reading, double-entry, and an independent adjudication, to manually curate structured data elements from unstructured clinical data. Further, through distributed computing strategies, DRESS protects data privacy by dividing MR data into de-identified modules. Finally, through internet-based computing cloud, DRESS enables many data specialists to work in a virtual environment to achieve the necessary scale of processing thousands MRs within days. This hybrid system represents probably a workable solution to solve the big medical data challenge. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-016-0357-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5006527
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50065272016-09-01 A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system Luo, Ligang Li, Liping Hu, Jiajia Wang, Xiaozhe Hou, Boulin Zhang, Tianze Zhao, Lue Ping BMC Med Inform Decis Mak Technical Advance BACKGROUND: Healthcare providers generate a huge amount of biomedical data stored in either legacy system (paper-based) format or electronic medical records (EMR) around the world, which are collectively referred to as big biomedical data (BBD). To realize the promise of BBD for clinical use and research, it is an essential step to extract key data elements from unstructured medical records into patient-centered electronic health records with computable data elements. Our objective is to introduce a novel solution, known as a double-reading/entry system (DRESS), for extracting clinical data from unstructured medical records (MR) and creating a semi-structured electronic health record database, as well as to demonstrate its reproducibility empirically. METHODS: Utilizing the modern cloud-based technologies, we have developed a comprehensive system that includes multiple subsystems, from capturing MRs in clinics, to securely transferring MRs, storing and managing cloud-based MRs, to facilitating both machine learning and manual reading, and to performing iterative quality control before committing the semi-structured data into the desired database. To evaluate the reproducibility of extracted medical data elements by DRESS, we conduct a blinded reproducibility study, with 100 MRs from patients who have undergone surgical treatment of lung cancer in China. The study uses Kappa statistic to measure concordance of discrete variables, and uses correlation coefficient to measure reproducibility of continuous variables. RESULTS: Using the DRESS, we have demonstrated the feasibility of extracting clinical data from unstructured MRs to create semi-structured and patient-centered electronic health record database. The reproducibility study with 100 patient’s MRs has shown an overall high reproducibility of 98 %, and varies across six modules (pathology, Radio/chemo therapy, clinical examination, surgery information, medical image and general patient information). CONCLUSIONS: DRESS uses a double-reading, double-entry, and an independent adjudication, to manually curate structured data elements from unstructured clinical data. Further, through distributed computing strategies, DRESS protects data privacy by dividing MR data into de-identified modules. Finally, through internet-based computing cloud, DRESS enables many data specialists to work in a virtual environment to achieve the necessary scale of processing thousands MRs within days. This hybrid system represents probably a workable solution to solve the big medical data challenge. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-016-0357-5) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-30 /pmc/articles/PMC5006527/ /pubmed/27577240 http://dx.doi.org/10.1186/s12911-016-0357-5 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Advance
Luo, Ligang
Li, Liping
Hu, Jiajia
Wang, Xiaozhe
Hou, Boulin
Zhang, Tianze
Zhao, Lue Ping
A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system
title A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system
title_full A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system
title_fullStr A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system
title_full_unstemmed A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system
title_short A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system
title_sort hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006527/
https://www.ncbi.nlm.nih.gov/pubmed/27577240
http://dx.doi.org/10.1186/s12911-016-0357-5
work_keys_str_mv AT luoligang ahybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT liliping ahybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT hujiajia ahybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT wangxiaozhe ahybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT houboulin ahybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT zhangtianze ahybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT zhaolueping ahybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT luoligang hybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT liliping hybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT hujiajia hybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT wangxiaozhe hybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT houboulin hybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT zhangtianze hybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem
AT zhaolueping hybridsolutionforextractingstructuredmedicalinformationfromunstructureddatainmedicalrecordsviaadoublereadingentrysystem