Cargando…

A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models

BACKGROUND: Recently high-throughput technologies have been massively used alongside clinical tests to study various types of cancer. Data generated in such large-scale studies are heterogeneous, of different types and formats. With lack of effective integration strategies novel models are necessary...

Descripción completa

Detalles Bibliográficos
Autores principales: Mihaylov, Iliyan, Kańduła, Maciej, Krachunov, Milko, Vassilev, Dimitar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6868770/
https://www.ncbi.nlm.nih.gov/pubmed/31752974
http://dx.doi.org/10.1186/s13062-019-0249-6
_version_ 1783472339552305152
author Mihaylov, Iliyan
Kańduła, Maciej
Krachunov, Milko
Vassilev, Dimitar
author_facet Mihaylov, Iliyan
Kańduła, Maciej
Krachunov, Milko
Vassilev, Dimitar
author_sort Mihaylov, Iliyan
collection PubMed
description BACKGROUND: Recently high-throughput technologies have been massively used alongside clinical tests to study various types of cancer. Data generated in such large-scale studies are heterogeneous, of different types and formats. With lack of effective integration strategies novel models are necessary for efficient and operative data integration, where both clinical and molecular information can be effectively joined for storage, access and ease of use. Such models, combined with machine learning methods for accurate prediction of survival time in cancer studies, can yield novel insights into disease development and lead to precise personalized therapies. RESULTS: We developed an approach for intelligent data integration of two cancer datasets (breast cancer and neuroblastoma) − provided in the CAMDA 2018 ‘Cancer Data Integration Challenge’, and compared models for prediction of survival time. We developed a novel semantic network-based data integration framework that utilizes NoSQL databases, where we combined clinical and expression profile data, using both raw data records and external knowledge sources. Utilizing the integrated data we introduced Tumor Integrated Clinical Feature (TICF) − a new feature for accurate prediction of patient survival time. Finally, we applied and validated several machine learning models for survival time prediction. CONCLUSION: We developed a framework for semantic integration of clinical and omics data that can borrow information across multiple cancer studies. By linking data with external domain knowledge sources our approach facilitates enrichment of the studied data by discovery of internal relations. The proposed and validated machine learning models for survival time prediction yielded accurate results. REVIEWERS: This article was reviewed by Eran Elhaik, Wenzhong Xiao and Carlos Loucera.
format Online
Article
Text
id pubmed-6868770
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68687702019-12-12 A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models Mihaylov, Iliyan Kańduła, Maciej Krachunov, Milko Vassilev, Dimitar Biol Direct Research BACKGROUND: Recently high-throughput technologies have been massively used alongside clinical tests to study various types of cancer. Data generated in such large-scale studies are heterogeneous, of different types and formats. With lack of effective integration strategies novel models are necessary for efficient and operative data integration, where both clinical and molecular information can be effectively joined for storage, access and ease of use. Such models, combined with machine learning methods for accurate prediction of survival time in cancer studies, can yield novel insights into disease development and lead to precise personalized therapies. RESULTS: We developed an approach for intelligent data integration of two cancer datasets (breast cancer and neuroblastoma) − provided in the CAMDA 2018 ‘Cancer Data Integration Challenge’, and compared models for prediction of survival time. We developed a novel semantic network-based data integration framework that utilizes NoSQL databases, where we combined clinical and expression profile data, using both raw data records and external knowledge sources. Utilizing the integrated data we introduced Tumor Integrated Clinical Feature (TICF) − a new feature for accurate prediction of patient survival time. Finally, we applied and validated several machine learning models for survival time prediction. CONCLUSION: We developed a framework for semantic integration of clinical and omics data that can borrow information across multiple cancer studies. By linking data with external domain knowledge sources our approach facilitates enrichment of the studied data by discovery of internal relations. The proposed and validated machine learning models for survival time prediction yielded accurate results. REVIEWERS: This article was reviewed by Eran Elhaik, Wenzhong Xiao and Carlos Loucera. BioMed Central 2019-11-21 /pmc/articles/PMC6868770/ /pubmed/31752974 http://dx.doi.org/10.1186/s13062-019-0249-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Mihaylov, Iliyan
Kańduła, Maciej
Krachunov, Milko
Vassilev, Dimitar
A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models
title A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models
title_full A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models
title_fullStr A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models
title_full_unstemmed A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models
title_short A novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models
title_sort novel framework for horizontal and vertical data integration in cancer studies with application to survival time prediction models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6868770/
https://www.ncbi.nlm.nih.gov/pubmed/31752974
http://dx.doi.org/10.1186/s13062-019-0249-6
work_keys_str_mv AT mihayloviliyan anovelframeworkforhorizontalandverticaldataintegrationincancerstudieswithapplicationtosurvivaltimepredictionmodels
AT kandułamaciej anovelframeworkforhorizontalandverticaldataintegrationincancerstudieswithapplicationtosurvivaltimepredictionmodels
AT krachunovmilko anovelframeworkforhorizontalandverticaldataintegrationincancerstudieswithapplicationtosurvivaltimepredictionmodels
AT vassilevdimitar anovelframeworkforhorizontalandverticaldataintegrationincancerstudieswithapplicationtosurvivaltimepredictionmodels
AT mihayloviliyan novelframeworkforhorizontalandverticaldataintegrationincancerstudieswithapplicationtosurvivaltimepredictionmodels
AT kandułamaciej novelframeworkforhorizontalandverticaldataintegrationincancerstudieswithapplicationtosurvivaltimepredictionmodels
AT krachunovmilko novelframeworkforhorizontalandverticaldataintegrationincancerstudieswithapplicationtosurvivaltimepredictionmodels
AT vassilevdimitar novelframeworkforhorizontalandverticaldataintegrationincancerstudieswithapplicationtosurvivaltimepredictionmodels