Cargando…

Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system

BACKGROUND: The American Society for Clinical Oncology recently launched the minimal common oncology data elements project to facilitate cancer data interoperability. However, clinical data are often unrecorded in an organized way, and converting them into a structured format can be time-consuming....

Descripción completa

Detalles Bibliográficos
Autores principales: Jung, Hyun Ae, Jeong, Oksoon, Chang, Dong Kyung, Park, Sehhoon, Sun, Jong-Mu, Lee, Se-Hoon, Ahn, Jin Seok, Ahn, Myung-Ju, Park, Keunchil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: AME Publishing Company 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577969/
https://www.ncbi.nlm.nih.gov/pubmed/34858777
http://dx.doi.org/10.21037/tlcr-21-531
_version_ 1784596174319124480
author Jung, Hyun Ae
Jeong, Oksoon
Chang, Dong Kyung
Park, Sehhoon
Sun, Jong-Mu
Lee, Se-Hoon
Ahn, Jin Seok
Ahn, Myung-Ju
Park, Keunchil
author_facet Jung, Hyun Ae
Jeong, Oksoon
Chang, Dong Kyung
Park, Sehhoon
Sun, Jong-Mu
Lee, Se-Hoon
Ahn, Jin Seok
Ahn, Myung-Ju
Park, Keunchil
author_sort Jung, Hyun Ae
collection PubMed
description BACKGROUND: The American Society for Clinical Oncology recently launched the minimal common oncology data elements project to facilitate cancer data interoperability. However, clinical data are often unrecorded in an organized way, and converting them into a structured format can be time-consuming. Clinical Data Warehouse (CDW) is a database that consolidates data from different clinical sources. However, the clinical data extracted from this database include not only structured data but also natural language generated during clinical practice. Therefore, applying these data to a clinical study is challenging because they are unstructured, and unformatted to allow essential content to be found. This study determined how best to organize a huge amount of clinical data to evaluate the upper aerodigestive tract cancers’ clinical features and outcomes, including cancer of the head and neck, esophagus, lung, thymus, and mesothelioma. METHODS: The Real-time autOmatically updated data warehOuse in healThcare (ROOT) uses six main regions to describe the journey of cancer patients. This study, developed an algorithm optimized for each disease category using natural language processing of unstructured data and data capture of structured data. Data from patients diagnosed at the Samsung Medical Center from 2008–2020 were used. RESULTS: Comprehensive clinical data for 67,617 patients across six tumor types: 28,954 with non-small-cell lung cancer, 2,540 with small-cell lung cancer, 30,035 with head and neck cancer, 4,950 with esophageal cancer, 966 with thymic cancer, and 172 with mesothelioma were collected. Additionally, the results of a longitudinal molecular study, including epidermal growth factor receptor (EGFR) mutations, anaplastic lymphoma kinase (ALK) tests, and next-generation sequencing (NGS), were included. Scattered information was integrated and automatically built up to match the cohort, allowing users to capture the most updated test results and treatment outcomes. CONCLUSIONS: This landmark study documented the successful construction of a real-time updating system for medical big data, based on the CDW program.
format Online
Article
Text
id pubmed-8577969
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher AME Publishing Company
record_format MEDLINE/PubMed
spelling pubmed-85779692021-12-01 Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system Jung, Hyun Ae Jeong, Oksoon Chang, Dong Kyung Park, Sehhoon Sun, Jong-Mu Lee, Se-Hoon Ahn, Jin Seok Ahn, Myung-Ju Park, Keunchil Transl Lung Cancer Res Original Article BACKGROUND: The American Society for Clinical Oncology recently launched the minimal common oncology data elements project to facilitate cancer data interoperability. However, clinical data are often unrecorded in an organized way, and converting them into a structured format can be time-consuming. Clinical Data Warehouse (CDW) is a database that consolidates data from different clinical sources. However, the clinical data extracted from this database include not only structured data but also natural language generated during clinical practice. Therefore, applying these data to a clinical study is challenging because they are unstructured, and unformatted to allow essential content to be found. This study determined how best to organize a huge amount of clinical data to evaluate the upper aerodigestive tract cancers’ clinical features and outcomes, including cancer of the head and neck, esophagus, lung, thymus, and mesothelioma. METHODS: The Real-time autOmatically updated data warehOuse in healThcare (ROOT) uses six main regions to describe the journey of cancer patients. This study, developed an algorithm optimized for each disease category using natural language processing of unstructured data and data capture of structured data. Data from patients diagnosed at the Samsung Medical Center from 2008–2020 were used. RESULTS: Comprehensive clinical data for 67,617 patients across six tumor types: 28,954 with non-small-cell lung cancer, 2,540 with small-cell lung cancer, 30,035 with head and neck cancer, 4,950 with esophageal cancer, 966 with thymic cancer, and 172 with mesothelioma were collected. Additionally, the results of a longitudinal molecular study, including epidermal growth factor receptor (EGFR) mutations, anaplastic lymphoma kinase (ALK) tests, and next-generation sequencing (NGS), were included. Scattered information was integrated and automatically built up to match the cohort, allowing users to capture the most updated test results and treatment outcomes. CONCLUSIONS: This landmark study documented the successful construction of a real-time updating system for medical big data, based on the CDW program. AME Publishing Company 2021-10 /pmc/articles/PMC8577969/ /pubmed/34858777 http://dx.doi.org/10.21037/tlcr-21-531 Text en 2021 Translational Lung Cancer Research. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Original Article
Jung, Hyun Ae
Jeong, Oksoon
Chang, Dong Kyung
Park, Sehhoon
Sun, Jong-Mu
Lee, Se-Hoon
Ahn, Jin Seok
Ahn, Myung-Ju
Park, Keunchil
Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system
title Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system
title_full Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system
title_fullStr Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system
title_full_unstemmed Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system
title_short Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system
title_sort real-time automatically updated data warehouse in healthcare (root): an innovative and automated data collection system
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577969/
https://www.ncbi.nlm.nih.gov/pubmed/34858777
http://dx.doi.org/10.21037/tlcr-21-531
work_keys_str_mv AT junghyunae realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem
AT jeongoksoon realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem
AT changdongkyung realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem
AT parksehhoon realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem
AT sunjongmu realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem
AT leesehoon realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem
AT ahnjinseok realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem
AT ahnmyungju realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem
AT parkkeunchil realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem