Cargando…

A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction

The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Yuchun, Tian, Kevin, Zeng, Haoyang, Guo, Xiaoyun, Gifford, David Kenneth
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5991515/
https://www.ncbi.nlm.nih.gov/pubmed/29654070
http://dx.doi.org/10.1101/gr.226852.117
_version_ 1783329840619847680
author Guo, Yuchun
Tian, Kevin
Zeng, Haoyang
Guo, Xiaoyun
Gifford, David Kenneth
author_facet Guo, Yuchun
Tian, Kevin
Zeng, Haoyang
Guo, Xiaoyun
Gifford, David Kenneth
author_sort Guo, Yuchun
collection PubMed
description The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consists of a set of aligned k-mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations.
format Online
Article
Text
id pubmed-5991515
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-59915152018-12-01 A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction Guo, Yuchun Tian, Kevin Zeng, Haoyang Guo, Xiaoyun Gifford, David Kenneth Genome Res Method The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consists of a set of aligned k-mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations. Cold Spring Harbor Laboratory Press 2018-06 /pmc/articles/PMC5991515/ /pubmed/29654070 http://dx.doi.org/10.1101/gr.226852.117 Text en © 2018 Guo et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Guo, Yuchun
Tian, Kevin
Zeng, Haoyang
Guo, Xiaoyun
Gifford, David Kenneth
A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction
title A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction
title_full A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction
title_fullStr A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction
title_full_unstemmed A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction
title_short A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction
title_sort novel k-mer set memory (ksm) motif representation improves regulatory variant prediction
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5991515/
https://www.ncbi.nlm.nih.gov/pubmed/29654070
http://dx.doi.org/10.1101/gr.226852.117
work_keys_str_mv AT guoyuchun anovelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT tiankevin anovelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT zenghaoyang anovelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT guoxiaoyun anovelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT gifforddavidkenneth anovelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT guoyuchun novelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT tiankevin novelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT zenghaoyang novelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT guoxiaoyun novelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction
AT gifforddavidkenneth novelkmersetmemoryksmmotifrepresentationimprovesregulatoryvariantprediction