Cargando…

DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding

MOTIVATION: Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have bee...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Wenxiu, Yang, Lin, Rohs, Remo, Noble, William Stafford
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870879/
https://www.ncbi.nlm.nih.gov/pubmed/28541376
http://dx.doi.org/10.1093/bioinformatics/btx336
_version_ 1783309559932125184
author Ma, Wenxiu
Yang, Lin
Rohs, Remo
Noble, William Stafford
author_facet Ma, Wenxiu
Yang, Lin
Rohs, Remo
Noble, William Stafford
author_sort Ma, Wenxiu
collection PubMed
description MOTIVATION: Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. RESULTS: We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values. AVAILABILITY AND IMPLEMENTATION: The software is available at https://bitbucket.org/wenxiu/sequence-shape.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5870879
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58708792018-03-29 DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding Ma, Wenxiu Yang, Lin Rohs, Remo Noble, William Stafford Bioinformatics Original Papers MOTIVATION: Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites. RESULTS: We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values. AVAILABILITY AND IMPLEMENTATION: The software is available at https://bitbucket.org/wenxiu/sequence-shape.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-10-01 2017-05-24 /pmc/articles/PMC5870879/ /pubmed/28541376 http://dx.doi.org/10.1093/bioinformatics/btx336 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Ma, Wenxiu
Yang, Lin
Rohs, Remo
Noble, William Stafford
DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding
title DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding
title_full DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding
title_fullStr DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding
title_full_unstemmed DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding
title_short DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding
title_sort dna sequence+shape kernel enables alignment-free modeling of transcription factor binding
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870879/
https://www.ncbi.nlm.nih.gov/pubmed/28541376
http://dx.doi.org/10.1093/bioinformatics/btx336
work_keys_str_mv AT mawenxiu dnasequenceshapekernelenablesalignmentfreemodelingoftranscriptionfactorbinding
AT yanglin dnasequenceshapekernelenablesalignmentfreemodelingoftranscriptionfactorbinding
AT rohsremo dnasequenceshapekernelenablesalignmentfreemodelingoftranscriptionfactorbinding
AT noblewilliamstafford dnasequenceshapekernelenablesalignmentfreemodelingoftranscriptionfactorbinding