Cargando…

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification

The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine l...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Lifei, Miao, Xuexia, Nie, Rui, Zhang, Zhang, Zhang, Jiang, Cai, Jun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8652257/ https://www.ncbi.nlm.nih.gov/pubmed/34899854 http://dx.doi.org/10.3389/fgene.2021.767602

_version_	1784611557674582016
author	Wang, Lifei Miao, Xuexia Nie, Rui Zhang, Zhang Zhang, Jiang Cai, Jun
author_facet	Wang, Lifei Miao, Xuexia Nie, Rui Zhang, Zhang Zhang, Jiang Cai, Jun
author_sort	Wang, Lifei
collection	PubMed
description	The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine learning methods, such as linear regression or random forest, could only deal with numerical features instead of modular features often encountered in the biological field. Here, we present MultiCapsNet (https://github.com/wanglf19/MultiCapsNet), a new deep learning model built on CapsNet and scCapsNet, which possesses the merits such as easy data integration and high model interpretability. To demonstrate the ability of this model as an interpretable classifier to deal with modular inputs, we test MultiCapsNet on three datasets with different data type and application scenarios. Firstly, on the labeled variant call dataset, MultiCapsNet shows a similar classification performance with neural network model, and provides importance scores for data sources directly without an extra importance determination step required by the neural network model. The importance scores generated by these two models are highly correlated. Secondly, on single cell RNA sequence (scRNA-seq) dataset, MultiCapsNet integrates information about protein-protein interaction (PPI), and protein-DNA interaction (PDI). The classification accuracy of MultiCapsNet is comparable to the neural network and random forest model. Meanwhile, MultiCapsNet reveals how each transcription factor (TF) or PPI cluster node contributes to classification of cell type. Thirdly, we made a comparison between MultiCapsNet and SCENIC. The results show several cell type relevant TFs identified by both methods, further proving the validity and interpretability of the MultiCapsNet.
format	Online Article Text
id	pubmed-8652257
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-86522572021-12-09 MultiCapsNet: A General Framework for Data Integration and Interpretable Classification Wang, Lifei Miao, Xuexia Nie, Rui Zhang, Zhang Zhang, Jiang Cai, Jun Front Genet Genetics The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine learning methods, such as linear regression or random forest, could only deal with numerical features instead of modular features often encountered in the biological field. Here, we present MultiCapsNet (https://github.com/wanglf19/MultiCapsNet), a new deep learning model built on CapsNet and scCapsNet, which possesses the merits such as easy data integration and high model interpretability. To demonstrate the ability of this model as an interpretable classifier to deal with modular inputs, we test MultiCapsNet on three datasets with different data type and application scenarios. Firstly, on the labeled variant call dataset, MultiCapsNet shows a similar classification performance with neural network model, and provides importance scores for data sources directly without an extra importance determination step required by the neural network model. The importance scores generated by these two models are highly correlated. Secondly, on single cell RNA sequence (scRNA-seq) dataset, MultiCapsNet integrates information about protein-protein interaction (PPI), and protein-DNA interaction (PDI). The classification accuracy of MultiCapsNet is comparable to the neural network and random forest model. Meanwhile, MultiCapsNet reveals how each transcription factor (TF) or PPI cluster node contributes to classification of cell type. Thirdly, we made a comparison between MultiCapsNet and SCENIC. The results show several cell type relevant TFs identified by both methods, further proving the validity and interpretability of the MultiCapsNet. Frontiers Media S.A. 2022-01-18 /pmc/articles/PMC8652257/ /pubmed/34899854 http://dx.doi.org/10.3389/fgene.2021.767602 Text en Copyright © 2022 Wang, Miao, Nie, Zhang, Zhang and Cai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Wang, Lifei Miao, Xuexia Nie, Rui Zhang, Zhang Zhang, Jiang Cai, Jun MultiCapsNet: A General Framework for Data Integration and Interpretable Classification
title	MultiCapsNet: A General Framework for Data Integration and Interpretable Classification
title_full	MultiCapsNet: A General Framework for Data Integration and Interpretable Classification
title_fullStr	MultiCapsNet: A General Framework for Data Integration and Interpretable Classification
title_full_unstemmed	MultiCapsNet: A General Framework for Data Integration and Interpretable Classification
title_short	MultiCapsNet: A General Framework for Data Integration and Interpretable Classification
title_sort	multicapsnet: a general framework for data integration and interpretable classification
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8652257/ https://www.ncbi.nlm.nih.gov/pubmed/34899854 http://dx.doi.org/10.3389/fgene.2021.767602
work_keys_str_mv	AT wanglifei multicapsnetageneralframeworkfordataintegrationandinterpretableclassification AT miaoxuexia multicapsnetageneralframeworkfordataintegrationandinterpretableclassification AT nierui multicapsnetageneralframeworkfordataintegrationandinterpretableclassification AT zhangzhang multicapsnetageneralframeworkfordataintegrationandinterpretableclassification AT zhangjiang multicapsnetageneralframeworkfordataintegrationandinterpretableclassification AT caijun multicapsnetageneralframeworkfordataintegrationandinterpretableclassification

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification

Ejemplares similares