Cargando…

Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors

BACKGROUND: Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large am...

Descripción completa

Detalles Bibliográficos
Autores principales: Yip, Kevin Y, Cheng, Chao, Bhardwaj, Nitin, Brown, James B, Leng, Jing, Kundaje, Anshul, Rozowsky, Joel, Birney, Ewan, Bickel, Peter, Snyder, Michael, Gerstein, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491392/
https://www.ncbi.nlm.nih.gov/pubmed/22950945
http://dx.doi.org/10.1186/gb-2012-13-9-r48
_version_ 1782248986411794432
author Yip, Kevin Y
Cheng, Chao
Bhardwaj, Nitin
Brown, James B
Leng, Jing
Kundaje, Anshul
Rozowsky, Joel
Birney, Ewan
Bickel, Peter
Snyder, Michael
Gerstein, Mark
author_facet Yip, Kevin Y
Cheng, Chao
Bhardwaj, Nitin
Brown, James B
Leng, Jing
Kundaje, Anshul
Rozowsky, Joel
Birney, Ewan
Bickel, Peter
Snyder, Michael
Gerstein, Mark
author_sort Yip, Kevin Y
collection PubMed
description BACKGROUND: Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors. RESULTS: As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions. CONCLUSIONS: Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data.
format Online
Article
Text
id pubmed-3491392
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34913922012-11-07 Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors Yip, Kevin Y Cheng, Chao Bhardwaj, Nitin Brown, James B Leng, Jing Kundaje, Anshul Rozowsky, Joel Birney, Ewan Bickel, Peter Snyder, Michael Gerstein, Mark Genome Biol Research BACKGROUND: Transcription factors function by binding different classes of regulatory elements. The Encyclopedia of DNA Elements (ENCODE) project has recently produced binding data for more than 100 transcription factors from about 500 ChIP-seq experiments in multiple cell types. While this large amount of data creates a valuable resource, it is nonetheless overwhelmingly complex and simultaneously incomplete since it covers only a small fraction of all human transcription factors. RESULTS: As part of the consortium effort in providing a concise abstraction of the data for facilitating various types of downstream analyses, we constructed statistical models that capture the genomic features of three paired types of regions by machine-learning methods: firstly, regions with active or inactive binding; secondly, those with extremely high or low degrees of co-binding, termed HOT and LOT regions; and finally, regulatory modules proximal or distal to genes. From the distal regulatory modules, we developed computational pipelines to identify potential enhancers, many of which were validated experimentally. We further associated the predicted enhancers with potential target transcripts and the transcription factors involved. For HOT regions, we found a significant fraction of transcription factor binding without clear sequence motifs and showed that this observation could be related to strong DNA accessibility of these regions. CONCLUSIONS: Overall, the three pairs of regions exhibit intricate differences in chromosomal locations, chromatin features, factors that bind them, and cell-type specificity. Our machine learning approach enables us to identify features potentially general to all transcription factors, including those not included in the data. BioMed Central 2012 2012-09-05 /pmc/articles/PMC3491392/ /pubmed/22950945 http://dx.doi.org/10.1186/gb-2012-13-9-r48 Text en Copyright ©2012 Yip et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Yip, Kevin Y
Cheng, Chao
Bhardwaj, Nitin
Brown, James B
Leng, Jing
Kundaje, Anshul
Rozowsky, Joel
Birney, Ewan
Bickel, Peter
Snyder, Michael
Gerstein, Mark
Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
title Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
title_full Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
title_fullStr Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
title_full_unstemmed Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
title_short Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
title_sort classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3491392/
https://www.ncbi.nlm.nih.gov/pubmed/22950945
http://dx.doi.org/10.1186/gb-2012-13-9-r48
work_keys_str_mv AT yipkeviny classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT chengchao classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT bhardwajnitin classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT brownjamesb classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT lengjing classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT kundajeanshul classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT rozowskyjoel classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT birneyewan classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT bickelpeter classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT snydermichael classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors
AT gersteinmark classificationofhumangenomicregionsbasedonexperimentallydeterminedbindingsitesofmorethan100transcriptionrelatedfactors