Cargando…
Identification of transcribed protein coding sequence remnants within lincRNAs
Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. How...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158594/ https://www.ncbi.nlm.nih.gov/pubmed/29986053 http://dx.doi.org/10.1093/nar/gky608 |
_version_ | 1783358446240792576 |
---|---|
author | Talyan, Sweta Andrade-Navarro, Miguel A Muro, Enrique M |
author_facet | Talyan, Sweta Andrade-Navarro, Miguel A Muro, Enrique M |
author_sort | Talyan, Sweta |
collection | PubMed |
description | Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can trace back their biogenesis from protein-coding genes or posterior transposon/retrotransposon insertions. Applying this method, we found 203 human lincRNA genes with regions significantly similar to protein-coding sequences. Our method provides a visualization tool to trace the evolutionary biogenesis of lincRNAs with respect to protein-coding genes by sequence divergence. Subsequently, we show the expression correlation between lincRNAs and their identified parental protein-coding genes using public RNA-seq repositories, hinting at novel gene regulatory relationships. In summary, we developed a novel computational methodology to study non-coding gene sequences, which can be applied to identify the evolutionary biogenesis and function of lincRNAs. |
format | Online Article Text |
id | pubmed-6158594 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-61585942018-10-02 Identification of transcribed protein coding sequence remnants within lincRNAs Talyan, Sweta Andrade-Navarro, Miguel A Muro, Enrique M Nucleic Acids Res Computational Biology Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can trace back their biogenesis from protein-coding genes or posterior transposon/retrotransposon insertions. Applying this method, we found 203 human lincRNA genes with regions significantly similar to protein-coding sequences. Our method provides a visualization tool to trace the evolutionary biogenesis of lincRNAs with respect to protein-coding genes by sequence divergence. Subsequently, we show the expression correlation between lincRNAs and their identified parental protein-coding genes using public RNA-seq repositories, hinting at novel gene regulatory relationships. In summary, we developed a novel computational methodology to study non-coding gene sequences, which can be applied to identify the evolutionary biogenesis and function of lincRNAs. Oxford University Press 2018-09-28 2018-07-09 /pmc/articles/PMC6158594/ /pubmed/29986053 http://dx.doi.org/10.1093/nar/gky608 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Computational Biology Talyan, Sweta Andrade-Navarro, Miguel A Muro, Enrique M Identification of transcribed protein coding sequence remnants within lincRNAs |
title | Identification of transcribed protein coding sequence remnants within lincRNAs |
title_full | Identification of transcribed protein coding sequence remnants within lincRNAs |
title_fullStr | Identification of transcribed protein coding sequence remnants within lincRNAs |
title_full_unstemmed | Identification of transcribed protein coding sequence remnants within lincRNAs |
title_short | Identification of transcribed protein coding sequence remnants within lincRNAs |
title_sort | identification of transcribed protein coding sequence remnants within lincrnas |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158594/ https://www.ncbi.nlm.nih.gov/pubmed/29986053 http://dx.doi.org/10.1093/nar/gky608 |
work_keys_str_mv | AT talyansweta identificationoftranscribedproteincodingsequenceremnantswithinlincrnas AT andradenavarromiguela identificationoftranscribedproteincodingsequenceremnantswithinlincrnas AT muroenriquem identificationoftranscribedproteincodingsequenceremnantswithinlincrnas |