Cargando…
COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data
MOTIVATION: The size of available omics datasets is steadily increasing with technological advancement in recent years. While this increase in sample size can be used to improve the performance of relevant prediction tasks in healthcare, models that are optimized for large datasets usually operate a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311322/ https://www.ncbi.nlm.nih.gov/pubmed/37387152 http://dx.doi.org/10.1093/bioinformatics/btad204 |
_version_ | 1785066718941413376 |
---|---|
author | Ditz, Jonas C Reuter, Bernhard Pfeifer, Nico |
author_facet | Ditz, Jonas C Reuter, Bernhard Pfeifer, Nico |
author_sort | Ditz, Jonas C |
collection | PubMed |
description | MOTIVATION: The size of available omics datasets is steadily increasing with technological advancement in recent years. While this increase in sample size can be used to improve the performance of relevant prediction tasks in healthcare, models that are optimized for large datasets usually operate as black boxes. In high-stakes scenarios, like healthcare, using a black-box model poses safety and security issues. Without an explanation about molecular factors and phenotypes that affected the prediction, healthcare providers are left with no choice but to blindly trust the models. We propose a new type of artificial neural network, named Convolutional Omics Kernel Network (COmic). By combining convolutional kernel networks with pathway-induced kernels, our method enables robust and interpretable end-to-end learning on omics datasets ranging in size from a few hundred to several hundreds of thousands of samples. Furthermore, COmic can be easily adapted to utilize multiomics data. RESULTS: We evaluated the performance capabilities of COmic on six different breast cancer cohorts. Additionally, we trained COmic models on multiomics data using the METABRIC cohort. Our models performed either better or similar to competitors on both tasks. We show how the use of pathway-induced Laplacian kernels opens the black-box nature of neural networks and results in intrinsically interpretable models that eliminate the need for post hoc explanation models. AVAILABILITY AND IMPLEMENTATION: Datasets, labels, and pathway-induced graph Laplacians used for the single-omics tasks can be downloaded at https://ibm.ent.box.com/s/ac2ilhyn7xjj27r0xiwtom4crccuobst/folder/48027287036. While datasets and graph Laplacians for the METABRIC cohort can be downloaded from the above mentioned repository, the labels have to be downloaded from cBioPortal at https://www.cbioportal.org/study/clinicalData?id=brca\_metabric. COmic source code as well as all scripts necessary to reproduce the experiments and analysis are publicly available at https://github.com/jditz/comics. |
format | Online Article Text |
id | pubmed-10311322 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103113222023-07-01 COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data Ditz, Jonas C Reuter, Bernhard Pfeifer, Nico Bioinformatics Biomedical Informatics MOTIVATION: The size of available omics datasets is steadily increasing with technological advancement in recent years. While this increase in sample size can be used to improve the performance of relevant prediction tasks in healthcare, models that are optimized for large datasets usually operate as black boxes. In high-stakes scenarios, like healthcare, using a black-box model poses safety and security issues. Without an explanation about molecular factors and phenotypes that affected the prediction, healthcare providers are left with no choice but to blindly trust the models. We propose a new type of artificial neural network, named Convolutional Omics Kernel Network (COmic). By combining convolutional kernel networks with pathway-induced kernels, our method enables robust and interpretable end-to-end learning on omics datasets ranging in size from a few hundred to several hundreds of thousands of samples. Furthermore, COmic can be easily adapted to utilize multiomics data. RESULTS: We evaluated the performance capabilities of COmic on six different breast cancer cohorts. Additionally, we trained COmic models on multiomics data using the METABRIC cohort. Our models performed either better or similar to competitors on both tasks. We show how the use of pathway-induced Laplacian kernels opens the black-box nature of neural networks and results in intrinsically interpretable models that eliminate the need for post hoc explanation models. AVAILABILITY AND IMPLEMENTATION: Datasets, labels, and pathway-induced graph Laplacians used for the single-omics tasks can be downloaded at https://ibm.ent.box.com/s/ac2ilhyn7xjj27r0xiwtom4crccuobst/folder/48027287036. While datasets and graph Laplacians for the METABRIC cohort can be downloaded from the above mentioned repository, the labels have to be downloaded from cBioPortal at https://www.cbioportal.org/study/clinicalData?id=brca\_metabric. COmic source code as well as all scripts necessary to reproduce the experiments and analysis are publicly available at https://github.com/jditz/comics. Oxford University Press 2023-06-30 /pmc/articles/PMC10311322/ /pubmed/37387152 http://dx.doi.org/10.1093/bioinformatics/btad204 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Biomedical Informatics Ditz, Jonas C Reuter, Bernhard Pfeifer, Nico COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data |
title | COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data |
title_full | COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data |
title_fullStr | COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data |
title_full_unstemmed | COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data |
title_short | COmic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data |
title_sort | comic: convolutional kernel networks for interpretable end-to-end learning on (multi-)omics data |
topic | Biomedical Informatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311322/ https://www.ncbi.nlm.nih.gov/pubmed/37387152 http://dx.doi.org/10.1093/bioinformatics/btad204 |
work_keys_str_mv | AT ditzjonasc comicconvolutionalkernelnetworksforinterpretableendtoendlearningonmultiomicsdata AT reuterbernhard comicconvolutionalkernelnetworksforinterpretableendtoendlearningonmultiomicsdata AT pfeifernico comicconvolutionalkernelnetworksforinterpretableendtoendlearningonmultiomicsdata |