Cargando…
Integrating multimodal data through interpretable heterogeneous ensembles
MOTIVATION: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular,...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9347276/ https://www.ncbi.nlm.nih.gov/pubmed/35923321 http://dx.doi.org/10.1101/2020.05.29.123497 |
_version_ | 1784761829021450240 |
---|---|
author | Li, Yan Chak Wang, Linhua Law, Jeffrey N. Murali, T. M. Pandey, Gaurav |
author_facet | Li, Yan Chak Wang, Linhua Law, Jeffrey N. Murali, T. M. Pandey, Gaurav |
author_sort | Li, Yan Chak |
collection | PubMed |
description | MOTIVATION: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities, but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. RESULTS: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms, and uses effective heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data, and mortality due to COVID-19 from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen (BUN) and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. AVAILABILITY: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. |
format | Online Article Text |
id | pubmed-9347276 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-93472762022-08-04 Integrating multimodal data through interpretable heterogeneous ensembles Li, Yan Chak Wang, Linhua Law, Jeffrey N. Murali, T. M. Pandey, Gaurav bioRxiv Article MOTIVATION: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities, but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. RESULTS: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms, and uses effective heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data, and mortality due to COVID-19 from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen (BUN) and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. AVAILABILITY: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. Cold Spring Harbor Laboratory 2022-07-25 /pmc/articles/PMC9347276/ /pubmed/35923321 http://dx.doi.org/10.1101/2020.05.29.123497 Text en https://creativecommons.org/licenses/by-nc/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Li, Yan Chak Wang, Linhua Law, Jeffrey N. Murali, T. M. Pandey, Gaurav Integrating multimodal data through interpretable heterogeneous ensembles |
title | Integrating multimodal data through interpretable heterogeneous ensembles |
title_full | Integrating multimodal data through interpretable heterogeneous ensembles |
title_fullStr | Integrating multimodal data through interpretable heterogeneous ensembles |
title_full_unstemmed | Integrating multimodal data through interpretable heterogeneous ensembles |
title_short | Integrating multimodal data through interpretable heterogeneous ensembles |
title_sort | integrating multimodal data through interpretable heterogeneous ensembles |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9347276/ https://www.ncbi.nlm.nih.gov/pubmed/35923321 http://dx.doi.org/10.1101/2020.05.29.123497 |
work_keys_str_mv | AT liyanchak integratingmultimodaldatathroughinterpretableheterogeneousensembles AT wanglinhua integratingmultimodaldatathroughinterpretableheterogeneousensembles AT lawjeffreyn integratingmultimodaldatathroughinterpretableheterogeneousensembles AT muralitm integratingmultimodaldatathroughinterpretableheterogeneousensembles AT pandeygaurav integratingmultimodaldatathroughinterpretableheterogeneousensembles |