Cargando…

Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse l...

Descripción completa

Detalles Bibliográficos
Autores principales: Er, Ahmet Gorkem, Ding, Daisy Yi, Er, Berrin, Uzun, Mertcan, Cakmak, Mehmet, Sadée, Christoph, Durhan, Gamze, Ozmen, Mustafa Nasuh, Tanriover, Mine Durusu, Topeli, Arzu, Son, Yesim Aydin, Tibshirani, Robert, Unal, Serhat, Gevaert, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Journal Experts 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10690316/
https://www.ncbi.nlm.nih.gov/pubmed/38045288
http://dx.doi.org/10.21203/rs.3.rs-3569833/v1
_version_ 1785152504013520896
author Er, Ahmet Gorkem
Ding, Daisy Yi
Er, Berrin
Uzun, Mertcan
Cakmak, Mehmet
Sadée, Christoph
Durhan, Gamze
Ozmen, Mustafa Nasuh
Tanriover, Mine Durusu
Topeli, Arzu
Son, Yesim Aydin
Tibshirani, Robert
Unal, Serhat
Gevaert, Olivier
author_facet Er, Ahmet Gorkem
Ding, Daisy Yi
Er, Berrin
Uzun, Mertcan
Cakmak, Mehmet
Sadée, Christoph
Durhan, Gamze
Ozmen, Mustafa Nasuh
Tanriover, Mine Durusu
Topeli, Arzu
Son, Yesim Aydin
Tibshirani, Robert
Unal, Serhat
Gevaert, Olivier
author_sort Er, Ahmet Gorkem
collection PubMed
description Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (𝑐𝑜𝑟𝑟(𝑋 u (𝟏) , Z v (𝟏) ) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
format Online
Article
Text
id pubmed-10690316
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Journal Experts
record_format MEDLINE/PubMed
spelling pubmed-106903162023-12-02 Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19 Er, Ahmet Gorkem Ding, Daisy Yi Er, Berrin Uzun, Mertcan Cakmak, Mehmet Sadée, Christoph Durhan, Gamze Ozmen, Mustafa Nasuh Tanriover, Mine Durusu Topeli, Arzu Son, Yesim Aydin Tibshirani, Robert Unal, Serhat Gevaert, Olivier Res Sq Article Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (𝑐𝑜𝑟𝑟(𝑋 u (𝟏) , Z v (𝟏) ) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks. American Journal Experts 2023-11-20 /pmc/articles/PMC10690316/ /pubmed/38045288 http://dx.doi.org/10.21203/rs.3.rs-3569833/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Er, Ahmet Gorkem
Ding, Daisy Yi
Er, Berrin
Uzun, Mertcan
Cakmak, Mehmet
Sadée, Christoph
Durhan, Gamze
Ozmen, Mustafa Nasuh
Tanriover, Mine Durusu
Topeli, Arzu
Son, Yesim Aydin
Tibshirani, Robert
Unal, Serhat
Gevaert, Olivier
Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19
title Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19
title_full Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19
title_fullStr Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19
title_full_unstemmed Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19
title_short Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19
title_sort multimodal biomedical data fusion using sparse canonical correlation analysis and cooperative learning: a cohort study on covid-19
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10690316/
https://www.ncbi.nlm.nih.gov/pubmed/38045288
http://dx.doi.org/10.21203/rs.3.rs-3569833/v1
work_keys_str_mv AT erahmetgorkem multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT dingdaisyyi multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT erberrin multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT uzunmertcan multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT cakmakmehmet multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT sadeechristoph multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT durhangamze multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT ozmenmustafanasuh multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT tanrioverminedurusu multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT topeliarzu multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT sonyesimaydin multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT tibshiranirobert multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT unalserhat multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19
AT gevaertolivier multimodalbiomedicaldatafusionusingsparsecanonicalcorrelationanalysisandcooperativelearningacohortstudyoncovid19