Cargando…

Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein bindin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kähärä, Juhani, Lähdesmäki, Harri
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750486/ https://www.ncbi.nlm.nih.gov/pubmed/24267147 http://dx.doi.org/10.1186/1471-2105-14-S10-S2

_version_	1782281424934535168
author	Kähärä, Juhani Lähdesmäki, Harri
author_facet	Kähärä, Juhani Lähdesmäki, Harri
author_sort	Kähärä, Juhani
collection	PubMed
description	Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.
format	Online Article Text
id	pubmed-3750486
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-37504862013-08-27 Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data Kähärä, Juhani Lähdesmäki, Harri BMC Bioinformatics Research Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance. BioMed Central 2013-08-12 /pmc/articles/PMC3750486/ /pubmed/24267147 http://dx.doi.org/10.1186/1471-2105-14-S10-S2 Text en Copyright © 2013 Kähärä and Lähdesmäki; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Kähärä, Juhani Lähdesmäki, Harri Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
title	Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
title_full	Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
title_fullStr	Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
title_full_unstemmed	Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
title_short	Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data
title_sort	evaluating a linear k-mer model for protein-dna interactions using high-throughput selex data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3750486/ https://www.ncbi.nlm.nih.gov/pubmed/24267147 http://dx.doi.org/10.1186/1471-2105-14-S10-S2
work_keys_str_mv	AT kaharajuhani evaluatingalinearkmermodelforproteindnainteractionsusinghighthroughputselexdata AT lahdesmakiharri evaluatingalinearkmermodelforproteindnainteractionsusinghighthroughputselexdata

Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

Ejemplares similares