Cargando…

Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)

BACKGROUND: Recent cancer genome studies on many human cancer types have relied on multiple molecular high-throughput technologies. Given the vast amount of data that has been generated, there are surprisingly few databases which facilitate access to these data and make them available for flexible a...

Descripción completa

Detalles Bibliográficos
Autores principales: Krempel, Rasmus, Kulkarni, Pranav, Yim, Annie, Lang, Ulrich, Habermann, Bianca, Frommolt, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5921751/
https://www.ncbi.nlm.nih.gov/pubmed/29699486
http://dx.doi.org/10.1186/s12859-018-2157-7
_version_ 1783318079486296064
author Krempel, Rasmus
Kulkarni, Pranav
Yim, Annie
Lang, Ulrich
Habermann, Bianca
Frommolt, Peter
author_facet Krempel, Rasmus
Kulkarni, Pranav
Yim, Annie
Lang, Ulrich
Habermann, Bianca
Frommolt, Peter
author_sort Krempel, Rasmus
collection PubMed
description BACKGROUND: Recent cancer genome studies on many human cancer types have relied on multiple molecular high-throughput technologies. Given the vast amount of data that has been generated, there are surprisingly few databases which facilitate access to these data and make them available for flexible analysis queries in the broad research community. If used in their entirety and provided at a high structural level, these data can be directed into constantly increasing databases which bear an enormous potential to serve as a basis for machine learning technologies with the goal to support research and healthcare with predictions of clinically relevant traits. RESULTS: We have developed the Cancer Systems Biology Database (CancerSysDB), a resource for highly flexible queries and analysis of cancer-related data across multiple data types and multiple studies. The CancerSysDB can be adopted by any center for the organization of their locally acquired data and its integration with publicly available data from multiple studies. A publicly available main instance of the CancerSysDB can be used to obtain highly flexible queries across multiple data types as shown by highly relevant use cases. In addition, we demonstrate how the CancerSysDB can be used for predictive cancer classification based on whole-exome data from 9091 patients in The Cancer Genome Atlas (TCGA) research network. CONCLUSIONS: Our database bears the potential to be used for large-scale integrative queries and predictive analytics of clinically relevant traits. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2157-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5921751
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59217512018-05-01 Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB) Krempel, Rasmus Kulkarni, Pranav Yim, Annie Lang, Ulrich Habermann, Bianca Frommolt, Peter BMC Bioinformatics Research Article BACKGROUND: Recent cancer genome studies on many human cancer types have relied on multiple molecular high-throughput technologies. Given the vast amount of data that has been generated, there are surprisingly few databases which facilitate access to these data and make them available for flexible analysis queries in the broad research community. If used in their entirety and provided at a high structural level, these data can be directed into constantly increasing databases which bear an enormous potential to serve as a basis for machine learning technologies with the goal to support research and healthcare with predictions of clinically relevant traits. RESULTS: We have developed the Cancer Systems Biology Database (CancerSysDB), a resource for highly flexible queries and analysis of cancer-related data across multiple data types and multiple studies. The CancerSysDB can be adopted by any center for the organization of their locally acquired data and its integration with publicly available data from multiple studies. A publicly available main instance of the CancerSysDB can be used to obtain highly flexible queries across multiple data types as shown by highly relevant use cases. In addition, we demonstrate how the CancerSysDB can be used for predictive cancer classification based on whole-exome data from 9091 patients in The Cancer Genome Atlas (TCGA) research network. CONCLUSIONS: Our database bears the potential to be used for large-scale integrative queries and predictive analytics of clinically relevant traits. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2157-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-04-24 /pmc/articles/PMC5921751/ /pubmed/29699486 http://dx.doi.org/10.1186/s12859-018-2157-7 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Krempel, Rasmus
Kulkarni, Pranav
Yim, Annie
Lang, Ulrich
Habermann, Bianca
Frommolt, Peter
Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)
title Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)
title_full Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)
title_fullStr Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)
title_full_unstemmed Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)
title_short Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)
title_sort integrative analysis and machine learning on cancer genomics data using the cancer systems biology database (cancersysdb)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5921751/
https://www.ncbi.nlm.nih.gov/pubmed/29699486
http://dx.doi.org/10.1186/s12859-018-2157-7
work_keys_str_mv AT krempelrasmus integrativeanalysisandmachinelearningoncancergenomicsdatausingthecancersystemsbiologydatabasecancersysdb
AT kulkarnipranav integrativeanalysisandmachinelearningoncancergenomicsdatausingthecancersystemsbiologydatabasecancersysdb
AT yimannie integrativeanalysisandmachinelearningoncancergenomicsdatausingthecancersystemsbiologydatabasecancersysdb
AT langulrich integrativeanalysisandmachinelearningoncancergenomicsdatausingthecancersystemsbiologydatabasecancersysdb
AT habermannbianca integrativeanalysisandmachinelearningoncancergenomicsdatausingthecancersystemsbiologydatabasecancersysdb
AT frommoltpeter integrativeanalysisandmachinelearningoncancergenomicsdatausingthecancersystemsbiologydatabasecancersysdb