Cargando…

Multimodal learning of noncoding variant effects using genome sequence and chromatin structure

MOTIVATION: A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epig...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Wuwei, Shen, Yang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10502240/
https://www.ncbi.nlm.nih.gov/pubmed/37669132
http://dx.doi.org/10.1093/bioinformatics/btad541
_version_ 1785106277872959488
author Tan, Wuwei
Shen, Yang
author_facet Tan, Wuwei
Shen, Yang
author_sort Tan, Wuwei
collection PubMed
description MOTIVATION: A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events. RESULTS: We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised “zero-shot” learning or supervised “few-shot” learning. AVAILABILITY AND IMPLEMENTATION: Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777.
format Online
Article
Text
id pubmed-10502240
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105022402023-09-16 Multimodal learning of noncoding variant effects using genome sequence and chromatin structure Tan, Wuwei Shen, Yang Bioinformatics Original Paper MOTIVATION: A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events. RESULTS: We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised “zero-shot” learning or supervised “few-shot” learning. AVAILABILITY AND IMPLEMENTATION: Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777. Oxford University Press 2023-09-05 /pmc/articles/PMC10502240/ /pubmed/37669132 http://dx.doi.org/10.1093/bioinformatics/btad541 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Tan, Wuwei
Shen, Yang
Multimodal learning of noncoding variant effects using genome sequence and chromatin structure
title Multimodal learning of noncoding variant effects using genome sequence and chromatin structure
title_full Multimodal learning of noncoding variant effects using genome sequence and chromatin structure
title_fullStr Multimodal learning of noncoding variant effects using genome sequence and chromatin structure
title_full_unstemmed Multimodal learning of noncoding variant effects using genome sequence and chromatin structure
title_short Multimodal learning of noncoding variant effects using genome sequence and chromatin structure
title_sort multimodal learning of noncoding variant effects using genome sequence and chromatin structure
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10502240/
https://www.ncbi.nlm.nih.gov/pubmed/37669132
http://dx.doi.org/10.1093/bioinformatics/btad541
work_keys_str_mv AT tanwuwei multimodallearningofnoncodingvarianteffectsusinggenomesequenceandchromatinstructure
AT shenyang multimodallearningofnoncodingvarianteffectsusinggenomesequenceandchromatinstructure