Cargando…

An integrated network representation of multiple cancer-specific data for graph-based machine learning

Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurat...

Descripción completa

Detalles Bibliográficos
Autores principales: Pu, Limeng, Singha, Manali, Wu, Hsiao-Chun, Busch, Costas, Ramanujam, J., Brylinski, Michal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9054771/
https://www.ncbi.nlm.nih.gov/pubmed/35487924
http://dx.doi.org/10.1038/s41540-022-00226-9
_version_ 1784697265915428864
author Pu, Limeng
Singha, Manali
Wu, Hsiao-Chun
Busch, Costas
Ramanujam, J.
Brylinski, Michal
author_facet Pu, Limeng
Singha, Manali
Wu, Hsiao-Chun
Busch, Costas
Ramanujam, J.
Brylinski, Michal
author_sort Pu, Limeng
collection PubMed
description Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. Emphasizing on the system-level complexity of cancer, we devised a procedure to integrate multiple heterogeneous data, including biological networks, genomics, inhibitor profiling, and gene-disease associations, into a unified graph structure. In order to construct compact, yet information-rich cancer-specific networks, we developed a novel graph reduction algorithm. Driven by not only the topological information, but also the biological knowledge, the graph reduction increases the feature-only entropy while preserving the valuable graph-feature information. Subsequent comparative benchmarking simulations employing a tissue level cross-validation protocol demonstrate that the accuracy of a graph-based predictor of the drug efficacy is 0.68, which is notably higher than those measured for more traditional, matrix-based techniques on the same data. Overall, the non-Euclidean representation of the cancer-specific data improves the performance of machine learning to predict the response of cancer to pharmacotherapy. The generated data are freely available to the academic community at https://osf.io/dzx7b/.
format Online
Article
Text
id pubmed-9054771
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-90547712022-05-01 An integrated network representation of multiple cancer-specific data for graph-based machine learning Pu, Limeng Singha, Manali Wu, Hsiao-Chun Busch, Costas Ramanujam, J. Brylinski, Michal NPJ Syst Biol Appl Article Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. Emphasizing on the system-level complexity of cancer, we devised a procedure to integrate multiple heterogeneous data, including biological networks, genomics, inhibitor profiling, and gene-disease associations, into a unified graph structure. In order to construct compact, yet information-rich cancer-specific networks, we developed a novel graph reduction algorithm. Driven by not only the topological information, but also the biological knowledge, the graph reduction increases the feature-only entropy while preserving the valuable graph-feature information. Subsequent comparative benchmarking simulations employing a tissue level cross-validation protocol demonstrate that the accuracy of a graph-based predictor of the drug efficacy is 0.68, which is notably higher than those measured for more traditional, matrix-based techniques on the same data. Overall, the non-Euclidean representation of the cancer-specific data improves the performance of machine learning to predict the response of cancer to pharmacotherapy. The generated data are freely available to the academic community at https://osf.io/dzx7b/. Nature Publishing Group UK 2022-04-29 /pmc/articles/PMC9054771/ /pubmed/35487924 http://dx.doi.org/10.1038/s41540-022-00226-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Pu, Limeng
Singha, Manali
Wu, Hsiao-Chun
Busch, Costas
Ramanujam, J.
Brylinski, Michal
An integrated network representation of multiple cancer-specific data for graph-based machine learning
title An integrated network representation of multiple cancer-specific data for graph-based machine learning
title_full An integrated network representation of multiple cancer-specific data for graph-based machine learning
title_fullStr An integrated network representation of multiple cancer-specific data for graph-based machine learning
title_full_unstemmed An integrated network representation of multiple cancer-specific data for graph-based machine learning
title_short An integrated network representation of multiple cancer-specific data for graph-based machine learning
title_sort integrated network representation of multiple cancer-specific data for graph-based machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9054771/
https://www.ncbi.nlm.nih.gov/pubmed/35487924
http://dx.doi.org/10.1038/s41540-022-00226-9
work_keys_str_mv AT pulimeng anintegratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT singhamanali anintegratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT wuhsiaochun anintegratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT buschcostas anintegratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT ramanujamj anintegratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT brylinskimichal anintegratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT pulimeng integratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT singhamanali integratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT wuhsiaochun integratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT buschcostas integratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT ramanujamj integratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning
AT brylinskimichal integratednetworkrepresentationofmultiplecancerspecificdataforgraphbasedmachinelearning