Cargando…

De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis

Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequ...

Descripción completa

Detalles Bibliográficos
Autores principales: Boeva, Valentina, Surdez, Didier, Guillon, Noëlle, Tirode, Franck, Fejes, Anthony P., Delattre, Olivier, Barillot, Emmanuel
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887977/
https://www.ncbi.nlm.nih.gov/pubmed/20375099
http://dx.doi.org/10.1093/nar/gkq217
_version_ 1782182622522245120
author Boeva, Valentina
Surdez, Didier
Guillon, Noëlle
Tirode, Franck
Fejes, Anthony P.
Delattre, Olivier
Barillot, Emmanuel
author_facet Boeva, Valentina
Surdez, Didier
Guillon, Noëlle
Tirode, Franck
Fejes, Anthony P.
Delattre, Olivier
Barillot, Emmanuel
author_sort Boeva, Valentina
collection PubMed
description Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to ∼150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression—positively as often as negatively—and at much larger distances (up to ∼1 Mb).
format Text
id pubmed-2887977
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28879772010-06-22 De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis Boeva, Valentina Surdez, Didier Guillon, Noëlle Tirode, Franck Fejes, Anthony P. Delattre, Olivier Barillot, Emmanuel Nucleic Acids Res Methods Online Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to ∼150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression—positively as often as negatively—and at much larger distances (up to ∼1 Mb). Oxford University Press 2010-06 2010-04-07 /pmc/articles/PMC2887977/ /pubmed/20375099 http://dx.doi.org/10.1093/nar/gkq217 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Boeva, Valentina
Surdez, Didier
Guillon, Noëlle
Tirode, Franck
Fejes, Anthony P.
Delattre, Olivier
Barillot, Emmanuel
De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
title De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
title_full De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
title_fullStr De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
title_full_unstemmed De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
title_short De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
title_sort de novo motif identification improves the accuracy of predicting transcription factor binding sites in chip-seq data analysis
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887977/
https://www.ncbi.nlm.nih.gov/pubmed/20375099
http://dx.doi.org/10.1093/nar/gkq217
work_keys_str_mv AT boevavalentina denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis
AT surdezdidier denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis
AT guillonnoelle denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis
AT tirodefranck denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis
AT fejesanthonyp denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis
AT delattreolivier denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis
AT barillotemmanuel denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis