Cargando…
Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach
BACKGROUND: Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investi...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5369223/ https://www.ncbi.nlm.nih.gov/pubmed/28347272 http://dx.doi.org/10.1186/s12864-017-3645-2 |
_version_ | 1782518090649567232 |
---|---|
author | Algama, Manjula Tasker, Edward Williams, Caitlin Parslow, Adam C. Bryson-Richardson, Robert J. Keith, Jonathan M. |
author_facet | Algama, Manjula Tasker, Edward Williams, Caitlin Parslow, Adam C. Bryson-Richardson, Robert J. Keith, Jonathan M. |
author_sort | Algama, Manjula |
collection | PubMed |
description | BACKGROUND: Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. RESULTS: We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. CONCLUSIONS: This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3645-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5369223 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53692232017-03-30 Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach Algama, Manjula Tasker, Edward Williams, Caitlin Parslow, Adam C. Bryson-Richardson, Robert J. Keith, Jonathan M. BMC Genomics Research Article BACKGROUND: Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. RESULTS: We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. CONCLUSIONS: This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3645-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-27 /pmc/articles/PMC5369223/ /pubmed/28347272 http://dx.doi.org/10.1186/s12864-017-3645-2 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Algama, Manjula Tasker, Edward Williams, Caitlin Parslow, Adam C. Bryson-Richardson, Robert J. Keith, Jonathan M. Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach |
title | Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach |
title_full | Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach |
title_fullStr | Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach |
title_full_unstemmed | Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach |
title_short | Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach |
title_sort | genome-wide identification of conserved intronic non-coding sequences using a bayesian segmentation approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5369223/ https://www.ncbi.nlm.nih.gov/pubmed/28347272 http://dx.doi.org/10.1186/s12864-017-3645-2 |
work_keys_str_mv | AT algamamanjula genomewideidentificationofconservedintronicnoncodingsequencesusingabayesiansegmentationapproach AT taskeredward genomewideidentificationofconservedintronicnoncodingsequencesusingabayesiansegmentationapproach AT williamscaitlin genomewideidentificationofconservedintronicnoncodingsequencesusingabayesiansegmentationapproach AT parslowadamc genomewideidentificationofconservedintronicnoncodingsequencesusingabayesiansegmentationapproach AT brysonrichardsonrobertj genomewideidentificationofconservedintronicnoncodingsequencesusingabayesiansegmentationapproach AT keithjonathanm genomewideidentificationofconservedintronicnoncodingsequencesusingabayesiansegmentationapproach |