Cargando…

An ontology-based documentation of data discovery and integration process in cancer outcomes research

BACKGROUND: To reduce cancer mortality and improve cancer outcomes, it is critical to understand the various cancer risk factors (RFs) across different domains (e.g., genetic, environmental, and behavioral risk factors) and levels (e.g., individual, interpersonal, and community levels). However, pri...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Hansi, Guo, Yi, Prosperi, Mattia, Bian, Jiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7734720/
https://www.ncbi.nlm.nih.gov/pubmed/33317497
http://dx.doi.org/10.1186/s12911-020-01270-3
_version_ 1783622523405991936
author Zhang, Hansi
Guo, Yi
Prosperi, Mattia
Bian, Jiang
author_facet Zhang, Hansi
Guo, Yi
Prosperi, Mattia
Bian, Jiang
author_sort Zhang, Hansi
collection PubMed
description BACKGROUND: To reduce cancer mortality and improve cancer outcomes, it is critical to understand the various cancer risk factors (RFs) across different domains (e.g., genetic, environmental, and behavioral risk factors) and levels (e.g., individual, interpersonal, and community levels). However, prior research on RFs of cancer outcomes, has primarily focused on individual level RFs due to the lack of integrated datasets that contain multi-level, multi-domain RFs. Further, the lack of a consensus and proper guidance on systematically identify RFs also increase the difficulty of RF selection from heterogenous data sources in a multi-level integrative data analysis (mIDA) study. More importantly, as mIDA studies require integrating heterogenous data sources, the data integration processes in the limited number of existing mIDA studies are inconsistently performed and poorly documented, and thus threatening transparency and reproducibility. METHODS: Informed by the National Institute on Minority Health and Health Disparities (NIMHD) research framework, we (1) reviewed existing reporting guidelines from the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) network and (2) developed a theory-driven reporting guideline to guide the RF variable selection, data source selection, and data integration process. Then, we developed an ontology to standardize the documentation of the RF selection and data integration process in mIDA studies. RESULTS: We summarized the review results and created a reporting guideline—ATTEST—for reporting the variable selection and data source selection and integration process. We provided an ATTEST check list to help researchers to annotate and clearly document each step of their mIDA studies to ensure the transparency and reproducibility. We used the ATTEST to report two mIDA case studies and further transformed annotation results into sematic triples, so that the relationships among variables, data sources and integration processes are explicitly standardized and modeled using the classes and properties from OD-ATTEST. CONCLUSION: Our ontology-based reporting guideline solves some key challenges in current mIDA studies for cancer outcomes research, through providing (1) a theory-driven guidance for multi-level and multi-domain RF variable and data source selection; and (2) a standardized documentation of the data selection and integration processes powered by an ontology, thus a way to enable sharing of mIDA study reports among researchers.
format Online
Article
Text
id pubmed-7734720
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77347202020-12-15 An ontology-based documentation of data discovery and integration process in cancer outcomes research Zhang, Hansi Guo, Yi Prosperi, Mattia Bian, Jiang BMC Med Inform Decis Mak Research BACKGROUND: To reduce cancer mortality and improve cancer outcomes, it is critical to understand the various cancer risk factors (RFs) across different domains (e.g., genetic, environmental, and behavioral risk factors) and levels (e.g., individual, interpersonal, and community levels). However, prior research on RFs of cancer outcomes, has primarily focused on individual level RFs due to the lack of integrated datasets that contain multi-level, multi-domain RFs. Further, the lack of a consensus and proper guidance on systematically identify RFs also increase the difficulty of RF selection from heterogenous data sources in a multi-level integrative data analysis (mIDA) study. More importantly, as mIDA studies require integrating heterogenous data sources, the data integration processes in the limited number of existing mIDA studies are inconsistently performed and poorly documented, and thus threatening transparency and reproducibility. METHODS: Informed by the National Institute on Minority Health and Health Disparities (NIMHD) research framework, we (1) reviewed existing reporting guidelines from the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) network and (2) developed a theory-driven reporting guideline to guide the RF variable selection, data source selection, and data integration process. Then, we developed an ontology to standardize the documentation of the RF selection and data integration process in mIDA studies. RESULTS: We summarized the review results and created a reporting guideline—ATTEST—for reporting the variable selection and data source selection and integration process. We provided an ATTEST check list to help researchers to annotate and clearly document each step of their mIDA studies to ensure the transparency and reproducibility. We used the ATTEST to report two mIDA case studies and further transformed annotation results into sematic triples, so that the relationships among variables, data sources and integration processes are explicitly standardized and modeled using the classes and properties from OD-ATTEST. CONCLUSION: Our ontology-based reporting guideline solves some key challenges in current mIDA studies for cancer outcomes research, through providing (1) a theory-driven guidance for multi-level and multi-domain RF variable and data source selection; and (2) a standardized documentation of the data selection and integration processes powered by an ontology, thus a way to enable sharing of mIDA study reports among researchers. BioMed Central 2020-12-14 /pmc/articles/PMC7734720/ /pubmed/33317497 http://dx.doi.org/10.1186/s12911-020-01270-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zhang, Hansi
Guo, Yi
Prosperi, Mattia
Bian, Jiang
An ontology-based documentation of data discovery and integration process in cancer outcomes research
title An ontology-based documentation of data discovery and integration process in cancer outcomes research
title_full An ontology-based documentation of data discovery and integration process in cancer outcomes research
title_fullStr An ontology-based documentation of data discovery and integration process in cancer outcomes research
title_full_unstemmed An ontology-based documentation of data discovery and integration process in cancer outcomes research
title_short An ontology-based documentation of data discovery and integration process in cancer outcomes research
title_sort ontology-based documentation of data discovery and integration process in cancer outcomes research
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7734720/
https://www.ncbi.nlm.nih.gov/pubmed/33317497
http://dx.doi.org/10.1186/s12911-020-01270-3
work_keys_str_mv AT zhanghansi anontologybaseddocumentationofdatadiscoveryandintegrationprocessincanceroutcomesresearch
AT guoyi anontologybaseddocumentationofdatadiscoveryandintegrationprocessincanceroutcomesresearch
AT prosperimattia anontologybaseddocumentationofdatadiscoveryandintegrationprocessincanceroutcomesresearch
AT bianjiang anontologybaseddocumentationofdatadiscoveryandintegrationprocessincanceroutcomesresearch
AT zhanghansi ontologybaseddocumentationofdatadiscoveryandintegrationprocessincanceroutcomesresearch
AT guoyi ontologybaseddocumentationofdatadiscoveryandintegrationprocessincanceroutcomesresearch
AT prosperimattia ontologybaseddocumentationofdatadiscoveryandintegrationprocessincanceroutcomesresearch
AT bianjiang ontologybaseddocumentationofdatadiscoveryandintegrationprocessincanceroutcomesresearch