Cargando…
Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system
BACKGROUND: The American Society for Clinical Oncology recently launched the minimal common oncology data elements project to facilitate cancer data interoperability. However, clinical data are often unrecorded in an organized way, and converting them into a structured format can be time-consuming....
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
AME Publishing Company
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577969/ https://www.ncbi.nlm.nih.gov/pubmed/34858777 http://dx.doi.org/10.21037/tlcr-21-531 |
_version_ | 1784596174319124480 |
---|---|
author | Jung, Hyun Ae Jeong, Oksoon Chang, Dong Kyung Park, Sehhoon Sun, Jong-Mu Lee, Se-Hoon Ahn, Jin Seok Ahn, Myung-Ju Park, Keunchil |
author_facet | Jung, Hyun Ae Jeong, Oksoon Chang, Dong Kyung Park, Sehhoon Sun, Jong-Mu Lee, Se-Hoon Ahn, Jin Seok Ahn, Myung-Ju Park, Keunchil |
author_sort | Jung, Hyun Ae |
collection | PubMed |
description | BACKGROUND: The American Society for Clinical Oncology recently launched the minimal common oncology data elements project to facilitate cancer data interoperability. However, clinical data are often unrecorded in an organized way, and converting them into a structured format can be time-consuming. Clinical Data Warehouse (CDW) is a database that consolidates data from different clinical sources. However, the clinical data extracted from this database include not only structured data but also natural language generated during clinical practice. Therefore, applying these data to a clinical study is challenging because they are unstructured, and unformatted to allow essential content to be found. This study determined how best to organize a huge amount of clinical data to evaluate the upper aerodigestive tract cancers’ clinical features and outcomes, including cancer of the head and neck, esophagus, lung, thymus, and mesothelioma. METHODS: The Real-time autOmatically updated data warehOuse in healThcare (ROOT) uses six main regions to describe the journey of cancer patients. This study, developed an algorithm optimized for each disease category using natural language processing of unstructured data and data capture of structured data. Data from patients diagnosed at the Samsung Medical Center from 2008–2020 were used. RESULTS: Comprehensive clinical data for 67,617 patients across six tumor types: 28,954 with non-small-cell lung cancer, 2,540 with small-cell lung cancer, 30,035 with head and neck cancer, 4,950 with esophageal cancer, 966 with thymic cancer, and 172 with mesothelioma were collected. Additionally, the results of a longitudinal molecular study, including epidermal growth factor receptor (EGFR) mutations, anaplastic lymphoma kinase (ALK) tests, and next-generation sequencing (NGS), were included. Scattered information was integrated and automatically built up to match the cohort, allowing users to capture the most updated test results and treatment outcomes. CONCLUSIONS: This landmark study documented the successful construction of a real-time updating system for medical big data, based on the CDW program. |
format | Online Article Text |
id | pubmed-8577969 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | AME Publishing Company |
record_format | MEDLINE/PubMed |
spelling | pubmed-85779692021-12-01 Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system Jung, Hyun Ae Jeong, Oksoon Chang, Dong Kyung Park, Sehhoon Sun, Jong-Mu Lee, Se-Hoon Ahn, Jin Seok Ahn, Myung-Ju Park, Keunchil Transl Lung Cancer Res Original Article BACKGROUND: The American Society for Clinical Oncology recently launched the minimal common oncology data elements project to facilitate cancer data interoperability. However, clinical data are often unrecorded in an organized way, and converting them into a structured format can be time-consuming. Clinical Data Warehouse (CDW) is a database that consolidates data from different clinical sources. However, the clinical data extracted from this database include not only structured data but also natural language generated during clinical practice. Therefore, applying these data to a clinical study is challenging because they are unstructured, and unformatted to allow essential content to be found. This study determined how best to organize a huge amount of clinical data to evaluate the upper aerodigestive tract cancers’ clinical features and outcomes, including cancer of the head and neck, esophagus, lung, thymus, and mesothelioma. METHODS: The Real-time autOmatically updated data warehOuse in healThcare (ROOT) uses six main regions to describe the journey of cancer patients. This study, developed an algorithm optimized for each disease category using natural language processing of unstructured data and data capture of structured data. Data from patients diagnosed at the Samsung Medical Center from 2008–2020 were used. RESULTS: Comprehensive clinical data for 67,617 patients across six tumor types: 28,954 with non-small-cell lung cancer, 2,540 with small-cell lung cancer, 30,035 with head and neck cancer, 4,950 with esophageal cancer, 966 with thymic cancer, and 172 with mesothelioma were collected. Additionally, the results of a longitudinal molecular study, including epidermal growth factor receptor (EGFR) mutations, anaplastic lymphoma kinase (ALK) tests, and next-generation sequencing (NGS), were included. Scattered information was integrated and automatically built up to match the cohort, allowing users to capture the most updated test results and treatment outcomes. CONCLUSIONS: This landmark study documented the successful construction of a real-time updating system for medical big data, based on the CDW program. AME Publishing Company 2021-10 /pmc/articles/PMC8577969/ /pubmed/34858777 http://dx.doi.org/10.21037/tlcr-21-531 Text en 2021 Translational Lung Cancer Research. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Original Article Jung, Hyun Ae Jeong, Oksoon Chang, Dong Kyung Park, Sehhoon Sun, Jong-Mu Lee, Se-Hoon Ahn, Jin Seok Ahn, Myung-Ju Park, Keunchil Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system |
title | Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system |
title_full | Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system |
title_fullStr | Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system |
title_full_unstemmed | Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system |
title_short | Real-time autOmatically updated data warehOuse in healThcare (ROOT): an innovative and automated data collection system |
title_sort | real-time automatically updated data warehouse in healthcare (root): an innovative and automated data collection system |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8577969/ https://www.ncbi.nlm.nih.gov/pubmed/34858777 http://dx.doi.org/10.21037/tlcr-21-531 |
work_keys_str_mv | AT junghyunae realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem AT jeongoksoon realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem AT changdongkyung realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem AT parksehhoon realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem AT sunjongmu realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem AT leesehoon realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem AT ahnjinseok realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem AT ahnmyungju realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem AT parkkeunchil realtimeautomaticallyupdateddatawarehouseinhealthcarerootaninnovativeandautomateddatacollectionsystem |