Cargando…

Integrating multimodal data through interpretable heterogeneous ensembles

MOTIVATION: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular,...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yan Chak, Wang, Linhua, Law, Jeffrey N, Murali, T M, Pandey, Gaurav
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9495448/
https://www.ncbi.nlm.nih.gov/pubmed/36158455
http://dx.doi.org/10.1093/bioadv/vbac065
_version_ 1784794020918067200
author Li, Yan Chak
Wang, Linhua
Law, Jeffrey N
Murali, T M
Pandey, Gaurav
author_facet Li, Yan Chak
Wang, Linhua
Law, Jeffrey N
Murali, T M
Pandey, Gaurav
author_sort Li, Yan Chak
collection PubMed
description MOTIVATION: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. RESULTS: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms and uses heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data and mortality due to coronavirus disease 2019 (COVID-19) from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9495448
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94954482022-09-23 Integrating multimodal data through interpretable heterogeneous ensembles Li, Yan Chak Wang, Linhua Law, Jeffrey N Murali, T M Pandey, Gaurav Bioinform Adv Original Article MOTIVATION: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. RESULTS: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms and uses heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data and mortality due to coronavirus disease 2019 (COVID-19) from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2022-09-12 /pmc/articles/PMC9495448/ /pubmed/36158455 http://dx.doi.org/10.1093/bioadv/vbac065 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Li, Yan Chak
Wang, Linhua
Law, Jeffrey N
Murali, T M
Pandey, Gaurav
Integrating multimodal data through interpretable heterogeneous ensembles
title Integrating multimodal data through interpretable heterogeneous ensembles
title_full Integrating multimodal data through interpretable heterogeneous ensembles
title_fullStr Integrating multimodal data through interpretable heterogeneous ensembles
title_full_unstemmed Integrating multimodal data through interpretable heterogeneous ensembles
title_short Integrating multimodal data through interpretable heterogeneous ensembles
title_sort integrating multimodal data through interpretable heterogeneous ensembles
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9495448/
https://www.ncbi.nlm.nih.gov/pubmed/36158455
http://dx.doi.org/10.1093/bioadv/vbac065
work_keys_str_mv AT liyanchak integratingmultimodaldatathroughinterpretableheterogeneousensembles
AT wanglinhua integratingmultimodaldatathroughinterpretableheterogeneousensembles
AT lawjeffreyn integratingmultimodaldatathroughinterpretableheterogeneousensembles
AT muralitm integratingmultimodaldatathroughinterpretableheterogeneousensembles
AT pandeygaurav integratingmultimodaldatathroughinterpretableheterogeneousensembles