Cargando…

Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data

BACKGROUND: ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However...

Descripción completa

Detalles Bibliográficos
Autores principales:	Levitsky, Victor G, Kulakovskiy, Ivan V, Ershov, Nikita I, Oshchepkov, Dmitry Yu, Makeev, Vsevolod J, Hodgman, T C, Merkulova, Tatyana I
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234207/ https://www.ncbi.nlm.nih.gov/pubmed/24472686 http://dx.doi.org/10.1186/1471-2164-15-80

_version_	1782344810632314880
author	Levitsky, Victor G Kulakovskiy, Ivan V Ershov, Nikita I Oshchepkov, Dmitry Yu Makeev, Vsevolod J Hodgman, T C Merkulova, Tatyana I
author_facet	Levitsky, Victor G Kulakovskiy, Ivan V Ershov, Nikita I Oshchepkov, Dmitry Yu Makeev, Vsevolod J Hodgman, T C Merkulova, Tatyana I
author_sort	Levitsky, Victor G
collection	PubMed
description	BACKGROUND: ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models. RESULTS: Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets. CONCLUSIONS: The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-80) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4234207
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42342072014-11-18 Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data Levitsky, Victor G Kulakovskiy, Ivan V Ershov, Nikita I Oshchepkov, Dmitry Yu Makeev, Vsevolod J Hodgman, T C Merkulova, Tatyana I BMC Genomics Research Article BACKGROUND: ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models. RESULTS: Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets. CONCLUSIONS: The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-80) contains supplementary material, which is available to authorized users. BioMed Central 2014-01-29 /pmc/articles/PMC4234207/ /pubmed/24472686 http://dx.doi.org/10.1186/1471-2164-15-80 Text en © Levitsky et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Levitsky, Victor G Kulakovskiy, Ivan V Ershov, Nikita I Oshchepkov, Dmitry Yu Makeev, Vsevolod J Hodgman, T C Merkulova, Tatyana I Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data
title	Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data
title_full	Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data
title_fullStr	Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data
title_full_unstemmed	Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data
title_short	Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data
title_sort	application of experimentally verified transcription factor binding sites models for computational analysis of chip-seq data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234207/ https://www.ncbi.nlm.nih.gov/pubmed/24472686 http://dx.doi.org/10.1186/1471-2164-15-80
work_keys_str_mv	AT levitskyvictorg applicationofexperimentallyverifiedtranscriptionfactorbindingsitesmodelsforcomputationalanalysisofchipseqdata AT kulakovskiyivanv applicationofexperimentallyverifiedtranscriptionfactorbindingsitesmodelsforcomputationalanalysisofchipseqdata AT ershovnikitai applicationofexperimentallyverifiedtranscriptionfactorbindingsitesmodelsforcomputationalanalysisofchipseqdata AT oshchepkovdmitryyu applicationofexperimentallyverifiedtranscriptionfactorbindingsitesmodelsforcomputationalanalysisofchipseqdata AT makeevvsevolodj applicationofexperimentallyverifiedtranscriptionfactorbindingsitesmodelsforcomputationalanalysisofchipseqdata AT hodgmantc applicationofexperimentallyverifiedtranscriptionfactorbindingsitesmodelsforcomputationalanalysisofchipseqdata AT merkulovatatyanai applicationofexperimentallyverifiedtranscriptionfactorbindingsitesmodelsforcomputationalanalysisofchipseqdata

Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data

Ejemplares similares