Cargando…
Multimodal learning of noncoding variant effects using genome sequence and chromatin structure
MOTIVATION: A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epig...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10502240/ https://www.ncbi.nlm.nih.gov/pubmed/37669132 http://dx.doi.org/10.1093/bioinformatics/btad541 |
_version_ | 1785106277872959488 |
---|---|
author | Tan, Wuwei Shen, Yang |
author_facet | Tan, Wuwei Shen, Yang |
author_sort | Tan, Wuwei |
collection | PubMed |
description | MOTIVATION: A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events. RESULTS: We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised “zero-shot” learning or supervised “few-shot” learning. AVAILABILITY AND IMPLEMENTATION: Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777. |
format | Online Article Text |
id | pubmed-10502240 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-105022402023-09-16 Multimodal learning of noncoding variant effects using genome sequence and chromatin structure Tan, Wuwei Shen, Yang Bioinformatics Original Paper MOTIVATION: A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events. RESULTS: We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised “zero-shot” learning or supervised “few-shot” learning. AVAILABILITY AND IMPLEMENTATION: Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777. Oxford University Press 2023-09-05 /pmc/articles/PMC10502240/ /pubmed/37669132 http://dx.doi.org/10.1093/bioinformatics/btad541 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Tan, Wuwei Shen, Yang Multimodal learning of noncoding variant effects using genome sequence and chromatin structure |
title | Multimodal learning of noncoding variant effects using genome sequence and chromatin structure |
title_full | Multimodal learning of noncoding variant effects using genome sequence and chromatin structure |
title_fullStr | Multimodal learning of noncoding variant effects using genome sequence and chromatin structure |
title_full_unstemmed | Multimodal learning of noncoding variant effects using genome sequence and chromatin structure |
title_short | Multimodal learning of noncoding variant effects using genome sequence and chromatin structure |
title_sort | multimodal learning of noncoding variant effects using genome sequence and chromatin structure |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10502240/ https://www.ncbi.nlm.nih.gov/pubmed/37669132 http://dx.doi.org/10.1093/bioinformatics/btad541 |
work_keys_str_mv | AT tanwuwei multimodallearningofnoncodingvarianteffectsusinggenomesequenceandchromatinstructure AT shenyang multimodallearningofnoncodingvarianteffectsusinggenomesequenceandchromatinstructure |