Cargando…

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation

BACKGROUND: Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructure...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Sijia, Wang, Yanshan, Wen, Andrew, Wang, Liwei, Hong, Na, Shen, Feichen, Bedrick, Steven, Hersh, William, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7576539/
https://www.ncbi.nlm.nih.gov/pubmed/33021486
http://dx.doi.org/10.2196/17376
_version_ 1783598035679313920
author Liu, Sijia
Wang, Yanshan
Wen, Andrew
Wang, Liwei
Hong, Na
Shen, Feichen
Bedrick, Steven
Hersh, William
Liu, Hongfang
author_facet Liu, Sijia
Wang, Yanshan
Wen, Andrew
Wang, Liwei
Hong, Na
Shen, Feichen
Bedrick, Steven
Hersh, William
Liu, Hongfang
author_sort Liu, Sijia
collection PubMed
description BACKGROUND: Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE: In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS: CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS: Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS: The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries.
format Online
Article
Text
id pubmed-7576539
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-75765392020-10-27 Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation Liu, Sijia Wang, Yanshan Wen, Andrew Wang, Liwei Hong, Na Shen, Feichen Bedrick, Steven Hersh, William Liu, Hongfang JMIR Med Inform Original Paper BACKGROUND: Widespread adoption of electronic health records has enabled the secondary use of electronic health record data for clinical research and health care delivery. Natural language processing techniques have shown promise in their capability to extract the information embedded in unstructured clinical data, and information retrieval techniques provide flexible and scalable solutions that can augment natural language processing systems for retrieving and ranking relevant records. OBJECTIVE: In this paper, we present the implementation of a cohort retrieval system that can execute textual cohort selection queries on both structured data and unstructured text—Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records (CREATE). METHODS: CREATE is a proof-of-concept system that leverages a combination of structured queries and information retrieval techniques on natural language processing results to improve cohort retrieval performance using the Observational Medical Outcomes Partnership Common Data Model to enhance model portability. The natural language processing component was used to extract common data model concepts from textual queries. We designed a hierarchical index to support the common data model concept search utilizing information retrieval techniques and frameworks. RESULTS: Our case study on 5 cohort identification queries, evaluated using the precision at 5 information retrieval metric at both the patient-level and document-level, demonstrates that CREATE achieves a mean precision at 5 of 0.90, which outperforms systems using only structured data or only unstructured text with mean precision at 5 values of 0.54 and 0.74, respectively. CONCLUSIONS: The implementation and evaluation of Mayo Clinic Biobank data demonstrated that CREATE outperforms cohort retrieval systems that only use one of either structured data or unstructured text in complex textual cohort queries. JMIR Publications 2020-10-06 /pmc/articles/PMC7576539/ /pubmed/33021486 http://dx.doi.org/10.2196/17376 Text en ©Sijia Liu, Yanshan Wang, Andrew Wen, Liwei Wang, Na Hong, Feichen Shen, Steven Bedrick, William Hersh, Hongfang Liu. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 06.10.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Liu, Sijia
Wang, Yanshan
Wen, Andrew
Wang, Liwei
Hong, Na
Shen, Feichen
Bedrick, Steven
Hersh, William
Liu, Hongfang
Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation
title Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation
title_full Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation
title_fullStr Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation
title_full_unstemmed Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation
title_short Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation
title_sort implementation of a cohort retrieval system for clinical data repositories using the observational medical outcomes partnership common data model: proof-of-concept system validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7576539/
https://www.ncbi.nlm.nih.gov/pubmed/33021486
http://dx.doi.org/10.2196/17376
work_keys_str_mv AT liusijia implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation
AT wangyanshan implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation
AT wenandrew implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation
AT wangliwei implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation
AT hongna implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation
AT shenfeichen implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation
AT bedricksteven implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation
AT hershwilliam implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation
AT liuhongfang implementationofacohortretrievalsystemforclinicaldatarepositoriesusingtheobservationalmedicaloutcomespartnershipcommondatamodelproofofconceptsystemvalidation