Cargando…

Most transcription factor binding sites are in a few mosaic classes of the human genome

BACKGROUND: Many algorithms for finding transcription factor binding sites have concentrated on the characterisation of the binding site itself: and these algorithms lead to a large number of false positive sites. The DNA sequence which does not bind has been modeled only to the extent necessary to...

Descripción completa

Detalles Bibliográficos
Autor principal: Evans, Kenneth J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881025/
https://www.ncbi.nlm.nih.gov/pubmed/20459624
http://dx.doi.org/10.1186/1471-2164-11-286
_version_ 1782182076602122240
author Evans, Kenneth J
author_facet Evans, Kenneth J
author_sort Evans, Kenneth J
collection PubMed
description BACKGROUND: Many algorithms for finding transcription factor binding sites have concentrated on the characterisation of the binding site itself: and these algorithms lead to a large number of false positive sites. The DNA sequence which does not bind has been modeled only to the extent necessary to complement this formulation. RESULTS: We find that the human genome may be described by 19 pairs of mosaic classes, each defined by its base frequencies, (or more precisely by the frequencies of doublets), so that typically a run of 10 to 100 bases belongs to the same class. Most experimentally verified binding sites are in the same four pairs of classes. In our sample of seventeen transcription factors — taken from different families of transcription factors — the average proportion of sites in this subset of classes was 75%, with values for individual factors ranging from 48% to 98%. By contrast these same classes contain only 26% of the bases of the genome and only 31% of occurrences of the motifs of these factors — that is places where one might expect the factors to bind. These results are not a consequence of the class composition in promoter regions. CONCLUSIONS: This method of analysis will help to find transcription factor binding sites and assist with the problem of false positives. These results also imply a profound difference between the mosaic classes.
format Text
id pubmed-2881025
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28810252010-06-05 Most transcription factor binding sites are in a few mosaic classes of the human genome Evans, Kenneth J BMC Genomics Research Article BACKGROUND: Many algorithms for finding transcription factor binding sites have concentrated on the characterisation of the binding site itself: and these algorithms lead to a large number of false positive sites. The DNA sequence which does not bind has been modeled only to the extent necessary to complement this formulation. RESULTS: We find that the human genome may be described by 19 pairs of mosaic classes, each defined by its base frequencies, (or more precisely by the frequencies of doublets), so that typically a run of 10 to 100 bases belongs to the same class. Most experimentally verified binding sites are in the same four pairs of classes. In our sample of seventeen transcription factors — taken from different families of transcription factors — the average proportion of sites in this subset of classes was 75%, with values for individual factors ranging from 48% to 98%. By contrast these same classes contain only 26% of the bases of the genome and only 31% of occurrences of the motifs of these factors — that is places where one might expect the factors to bind. These results are not a consequence of the class composition in promoter regions. CONCLUSIONS: This method of analysis will help to find transcription factor binding sites and assist with the problem of false positives. These results also imply a profound difference between the mosaic classes. BioMed Central 2010-05-06 /pmc/articles/PMC2881025/ /pubmed/20459624 http://dx.doi.org/10.1186/1471-2164-11-286 Text en Copyright ©2010 Evans; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Evans, Kenneth J
Most transcription factor binding sites are in a few mosaic classes of the human genome
title Most transcription factor binding sites are in a few mosaic classes of the human genome
title_full Most transcription factor binding sites are in a few mosaic classes of the human genome
title_fullStr Most transcription factor binding sites are in a few mosaic classes of the human genome
title_full_unstemmed Most transcription factor binding sites are in a few mosaic classes of the human genome
title_short Most transcription factor binding sites are in a few mosaic classes of the human genome
title_sort most transcription factor binding sites are in a few mosaic classes of the human genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881025/
https://www.ncbi.nlm.nih.gov/pubmed/20459624
http://dx.doi.org/10.1186/1471-2164-11-286
work_keys_str_mv AT evanskennethj mosttranscriptionfactorbindingsitesareinafewmosaicclassesofthehumangenome