Cargando…

GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner

There has been much effort to prioritize genomic variants with respect to their impact on “function”. However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we c...

Descripción completa

Detalles Bibliográficos
Autores principales: Lou, Shaoke, Cotter, Kellie A., Li, Tianxiao, Liang, Jin, Mohsen, Hussein, Liu, Jason, Zhang, Jing, Cohen, Sandra, Xu, Jinrui, Yu, Haiyuan, Rubin, Mark A., Gerstein, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742416/
https://www.ncbi.nlm.nih.gov/pubmed/31469829
http://dx.doi.org/10.1371/journal.pgen.1007860
_version_ 1783451110153912320
author Lou, Shaoke
Cotter, Kellie A.
Li, Tianxiao
Liang, Jin
Mohsen, Hussein
Liu, Jason
Zhang, Jing
Cohen, Sandra
Xu, Jinrui
Yu, Haiyuan
Rubin, Mark A.
Gerstein, Mark
author_facet Lou, Shaoke
Cotter, Kellie A.
Li, Tianxiao
Liang, Jin
Mohsen, Hussein
Liu, Jason
Zhang, Jing
Cohen, Sandra
Xu, Jinrui
Yu, Haiyuan
Rubin, Mark A.
Gerstein, Mark
author_sort Lou, Shaoke
collection PubMed
description There has been much effort to prioritize genomic variants with respect to their impact on “function”. However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL).
format Online
Article
Text
id pubmed-6742416
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67424162019-09-20 GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner Lou, Shaoke Cotter, Kellie A. Li, Tianxiao Liang, Jin Mohsen, Hussein Liu, Jason Zhang, Jing Cohen, Sandra Xu, Jinrui Yu, Haiyuan Rubin, Mark A. Gerstein, Mark PLoS Genet Research Article There has been much effort to prioritize genomic variants with respect to their impact on “function”. However, function is often not precisely defined: sometimes it is the disease association of a variant; on other occasions, it reflects a molecular effect on transcription or epigenetics. Here, we coupled multiple genomic predictors to build GRAM, a GeneRAlized Model, to predict a well-defined experimental target: the expression-modulating effect of a non-coding variant on its associated gene, in a transferable, cell-specific manner. Firstly, we performed feature engineering: using LASSO, a regularized linear model, we found transcription factor (TF) binding most predictive, especially for TFs that are hubs in the regulatory network; in contrast, evolutionary conservation, a popular feature in many other variant-impact predictors, has almost no contribution. Moreover, TF binding inferred from in vitro SELEX is as effective as that from in vivo ChIP-Seq. Second, we implemented GRAM integrating only SELEX features and expression profiles; thus, the program combines a universal regulatory score with an easily obtainable modifier reflecting the particular cell type. We benchmarked GRAM on large-scale MPRA datasets, achieving AUROC scores of 0.72 in GM12878 and 0.66 in a multi-cell line dataset. We then evaluated the performance of GRAM on targeted regions using luciferase assays in the MCF7 and K562 cell lines. We noted that changing the insertion position of the construct relative to the reporter gene gave very different results, highlighting the importance of carefully defining the exact prediction target of the model. Finally, we illustrated the utility of GRAM in fine-mapping causal variants and developed a practical software pipeline to carry this out. In particular, we demonstrated in specific examples how the pipeline could pinpoint variants that directly modulate gene expression within a larger linkage-disequilibrium block associated with a phenotype of interest (e.g., for an eQTL). Public Library of Science 2019-08-30 /pmc/articles/PMC6742416/ /pubmed/31469829 http://dx.doi.org/10.1371/journal.pgen.1007860 Text en © 2019 Lou et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lou, Shaoke
Cotter, Kellie A.
Li, Tianxiao
Liang, Jin
Mohsen, Hussein
Liu, Jason
Zhang, Jing
Cohen, Sandra
Xu, Jinrui
Yu, Haiyuan
Rubin, Mark A.
Gerstein, Mark
GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner
title GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner
title_full GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner
title_fullStr GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner
title_full_unstemmed GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner
title_short GRAM: A GeneRAlized Model to predict the molecular effect of a non-coding variant in a cell-type specific manner
title_sort gram: a generalized model to predict the molecular effect of a non-coding variant in a cell-type specific manner
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6742416/
https://www.ncbi.nlm.nih.gov/pubmed/31469829
http://dx.doi.org/10.1371/journal.pgen.1007860
work_keys_str_mv AT loushaoke gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT cotterkelliea gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT litianxiao gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT liangjin gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT mohsenhussein gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT liujason gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT zhangjing gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT cohensandra gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT xujinrui gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT yuhaiyuan gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT rubinmarka gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner
AT gersteinmark gramageneralizedmodeltopredictthemoleculareffectofanoncodingvariantinacelltypespecificmanner