Cargando…
GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner
There has been much effort to prioritize genomic variants with respect to their impact on “function”. However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we c...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742416/ https://www.ncbi.nlm.nih.gov/pubmed/31469829 http://dx.doi.org/10.1371/journal.pgen.1007860 |
_version_ | 1783451110153912320 |
---|---|
author | Lou, Shaoke Cotter, Kellie A. Li, Tianxiao Liang, Jin Mohsen, Hussein Liu, Jason Zhang, Jing Cohen, Sandra Xu, Jinrui Yu, Haiyuan Rubin, Mark A. Gerstein, Mark |
author_facet | Lou, Shaoke Cotter, Kellie A. Li, Tianxiao Liang, Jin Mohsen, Hussein Liu, Jason Zhang, Jing Cohen, Sandra Xu, Jinrui Yu, Haiyuan Rubin, Mark A. Gerstein, Mark |
author_sort | Lou, Shaoke |
collection | PubMed |
description | There has been much effort to prioritize genomic variants with respect to their impact on “function”. However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL). |
format | Online Article Text |
id | pubmed-6742416 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-67424162019-09-20 GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner Lou, Shaoke Cotter, Kellie A. Li, Tianxiao Liang, Jin Mohsen, Hussein Liu, Jason Zhang, Jing Cohen, Sandra Xu, Jinrui Yu, Haiyuan Rubin, Mark A. Gerstein, Mark PLoS Genet Research Article There has been much effort to prioritize genomic variants with respect to their impact on “function”. However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL). Public Library of Science 2019-08-30 /pmc/articles/PMC6742416/ /pubmed/31469829 http://dx.doi.org/10.1371/journal.pgen.1007860 Text en © 2019 Lou et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Lou, Shaoke Cotter, Kellie A. Li, Tianxiao Liang, Jin Mohsen, Hussein Liu, Jason Zhang, Jing Cohen, Sandra Xu, Jinrui Yu, Haiyuan Rubin, Mark A. Gerstein, Mark GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner |
title | GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner |
title_full | GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner |
title_fullStr | GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner |
title_full_unstemmed | GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner |
title_short | GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner |
title_sort | gram: a generalized model to predict the molecular effect of a non-coding variant in a cell-type specific manner |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742416/ https://www.ncbi.nlm.nih.gov/pubmed/31469829 http://dx.doi.org/10.1371/journal.pgen.1007860 |
work_keys_str_mv | AT loushaoke gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT cotterkelliea gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT litianxiao gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT liangjin gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT mohsenhussein gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT liujason gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT zhangjing gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT cohensandra gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT xujinrui gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT yuhaiyuan gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT rubinmarka gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner AT gersteinmark gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner |