Cargando…
De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequ...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887977/ https://www.ncbi.nlm.nih.gov/pubmed/20375099 http://dx.doi.org/10.1093/nar/gkq217 |
_version_ | 1782182622522245120 |
---|---|
author | Boeva, Valentina Surdez, Didier Guillon, Noëlle Tirode, Franck Fejes, Anthony P. Delattre, Olivier Barillot, Emmanuel |
author_facet | Boeva, Valentina Surdez, Didier Guillon, Noëlle Tirode, Franck Fejes, Anthony P. Delattre, Olivier Barillot, Emmanuel |
author_sort | Boeva, Valentina |
collection | PubMed |
description | Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to ∼150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression—positively as often as negatively—and at much larger distances (up to ∼1 Mb). |
format | Text |
id | pubmed-2887977 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-28879772010-06-22 De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis Boeva, Valentina Surdez, Didier Guillon, Noëlle Tirode, Franck Fejes, Anthony P. Delattre, Olivier Barillot, Emmanuel Nucleic Acids Res Methods Online Dramatic progress in the development of next-generation sequencing technologies has enabled accurate genome-wide characterization of the binding sites of DNA-associated proteins. This technique, baptized as ChIP-Seq, uses a combination of chromatin immunoprecipitation and massively parallel DNA sequencing. Other published tools that predict binding sites from ChIP-Seq data use only positional information of mapped reads. In contrast, our algorithm MICSA (Motif Identification for ChIP-Seq Analysis) combines this source of positional information with information on motif occurrences to better predict binding sites of transcription factors (TFs). We proved the greater accuracy of MICSA with respect to several other tools by running them on datasets for the TFs NRSF, GABP, STAT1 and CTCF. We also applied MICSA on a dataset for the oncogenic TF EWS-FLI1. We discovered >2000 binding sites and two functionally different binding motifs. We observed that EWS-FLI1 can activate gene transcription when (i) its binding site is located in close proximity to the gene transcription start site (up to ∼150 kb), and (ii) it contains a microsatellite sequence. Furthermore, we observed that sites without microsatellites can also induce regulation of gene expression—positively as often as negatively—and at much larger distances (up to ∼1 Mb). Oxford University Press 2010-06 2010-04-07 /pmc/articles/PMC2887977/ /pubmed/20375099 http://dx.doi.org/10.1093/nar/gkq217 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Boeva, Valentina Surdez, Didier Guillon, Noëlle Tirode, Franck Fejes, Anthony P. Delattre, Olivier Barillot, Emmanuel De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis |
title | De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis |
title_full | De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis |
title_fullStr | De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis |
title_full_unstemmed | De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis |
title_short | De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis |
title_sort | de novo motif identification improves the accuracy of predicting transcription factor binding sites in chip-seq data analysis |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887977/ https://www.ncbi.nlm.nih.gov/pubmed/20375099 http://dx.doi.org/10.1093/nar/gkq217 |
work_keys_str_mv | AT boevavalentina denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis AT surdezdidier denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis AT guillonnoelle denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis AT tirodefranck denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis AT fejesanthonyp denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis AT delattreolivier denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis AT barillotemmanuel denovomotifidentificationimprovestheaccuracyofpredictingtranscriptionfactorbindingsitesinchipseqdataanalysis |