Cargando…

Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer

Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we f...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Fuhai, Dong, Zehao, Zhao, Qihang, Payne, Philip, Province, Michael, Cruchaga, Carlos, Zhang, Muhan, Zhao, Tianyu, Chen, Yixin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Journal Experts 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680938/
https://www.ncbi.nlm.nih.gov/pubmed/38014034
http://dx.doi.org/10.21203/rs.3.rs-3576068/v1
_version_ 1785150747478851584
author Li, Fuhai
Dong, Zehao
Zhao, Qihang
Payne, Philip
Province, Michael
Cruchaga, Carlos
Zhang, Muhan
Zhao, Tianyu
Chen, Yixin
author_facet Li, Fuhai
Dong, Zehao
Zhao, Qihang
Payne, Philip
Province, Michael
Cruchaga, Carlos
Zhang, Muhan
Zhao, Tianyu
Chen, Yixin
author_sort Li, Fuhai
collection PubMed
description Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction/diagnosis accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer , which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability (~ 30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer’s Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.
format Online
Article
Text
id pubmed-10680938
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Journal Experts
record_format MEDLINE/PubMed
spelling pubmed-106809382023-11-27 Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer Li, Fuhai Dong, Zehao Zhao, Qihang Payne, Philip Province, Michael Cruchaga, Carlos Zhang, Muhan Zhao, Tianyu Chen, Yixin Res Sq Article Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction/diagnosis accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer , which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability (~ 30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer’s Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies. American Journal Experts 2023-11-16 /pmc/articles/PMC10680938/ /pubmed/38014034 http://dx.doi.org/10.21203/rs.3.rs-3576068/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Li, Fuhai
Dong, Zehao
Zhao, Qihang
Payne, Philip
Province, Michael
Cruchaga, Carlos
Zhang, Muhan
Zhao, Tianyu
Chen, Yixin
Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer
title Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer
title_full Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer
title_fullStr Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer
title_full_unstemmed Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer
title_short Highly accurate disease diagnosis and highly reproducible biomarker identification with PathFormer
title_sort highly accurate disease diagnosis and highly reproducible biomarker identification with pathformer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10680938/
https://www.ncbi.nlm.nih.gov/pubmed/38014034
http://dx.doi.org/10.21203/rs.3.rs-3576068/v1
work_keys_str_mv AT lifuhai highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer
AT dongzehao highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer
AT zhaoqihang highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer
AT paynephilip highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer
AT provincemichael highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer
AT cruchagacarlos highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer
AT zhangmuhan highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer
AT zhaotianyu highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer
AT chenyixin highlyaccuratediseasediagnosisandhighlyreproduciblebiomarkeridentificationwithpathformer