Cargando…

Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated c...

Descripción completa

Detalles Bibliográficos
Autores principales: Hemberg, Martin, Gray, Jesse M., Cloonan, Nicole, Kuersten, Scott, Grimmond, Sean, Greenberg, Michael E., Kreiman, Gabriel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439890/
https://www.ncbi.nlm.nih.gov/pubmed/22684627
http://dx.doi.org/10.1093/nar/gks477
_version_ 1782243084170428416
author Hemberg, Martin
Gray, Jesse M.
Cloonan, Nicole
Kuersten, Scott
Grimmond, Sean
Greenberg, Michael E.
Kreiman, Gabriel
author_facet Hemberg, Martin
Gray, Jesse M.
Cloonan, Nicole
Kuersten, Scott
Grimmond, Sean
Greenberg, Michael E.
Kreiman, Gabriel
author_sort Hemberg, Martin
collection PubMed
description More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements.
format Online
Article
Text
id pubmed-3439890
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34398902012-09-12 Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites Hemberg, Martin Gray, Jesse M. Cloonan, Nicole Kuersten, Scott Grimmond, Sean Greenberg, Michael E. Kreiman, Gabriel Nucleic Acids Res Genomics More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements. Oxford University Press 2012-09 2012-06-08 /pmc/articles/PMC3439890/ /pubmed/22684627 http://dx.doi.org/10.1093/nar/gks477 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genomics
Hemberg, Martin
Gray, Jesse M.
Cloonan, Nicole
Kuersten, Scott
Grimmond, Sean
Greenberg, Michael E.
Kreiman, Gabriel
Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites
title Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites
title_full Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites
title_fullStr Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites
title_full_unstemmed Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites
title_short Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites
title_sort integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites
topic Genomics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439890/
https://www.ncbi.nlm.nih.gov/pubmed/22684627
http://dx.doi.org/10.1093/nar/gks477
work_keys_str_mv AT hembergmartin integratedgenomeanalysissuggeststhatmostconservednoncodingsequencesareregulatoryfactorbindingsites
AT grayjessem integratedgenomeanalysissuggeststhatmostconservednoncodingsequencesareregulatoryfactorbindingsites
AT cloonannicole integratedgenomeanalysissuggeststhatmostconservednoncodingsequencesareregulatoryfactorbindingsites
AT kuerstenscott integratedgenomeanalysissuggeststhatmostconservednoncodingsequencesareregulatoryfactorbindingsites
AT grimmondsean integratedgenomeanalysissuggeststhatmostconservednoncodingsequencesareregulatoryfactorbindingsites
AT greenbergmichaele integratedgenomeanalysissuggeststhatmostconservednoncodingsequencesareregulatoryfactorbindingsites
AT kreimangabriel integratedgenomeanalysissuggeststhatmostconservednoncodingsequencesareregulatoryfactorbindingsites