Cargando…

Identification of transcribed protein coding sequence remnants within lincRNAs

Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. How...

Descripción completa

Detalles Bibliográficos
Autores principales: Talyan, Sweta, Andrade-Navarro, Miguel A, Muro, Enrique M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158594/
https://www.ncbi.nlm.nih.gov/pubmed/29986053
http://dx.doi.org/10.1093/nar/gky608
_version_ 1783358446240792576
author Talyan, Sweta
Andrade-Navarro, Miguel A
Muro, Enrique M
author_facet Talyan, Sweta
Andrade-Navarro, Miguel A
Muro, Enrique M
author_sort Talyan, Sweta
collection PubMed
description Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can trace back their biogenesis from protein-coding genes or posterior transposon/retrotransposon insertions. Applying this method, we found 203 human lincRNA genes with regions significantly similar to protein-coding sequences. Our method provides a visualization tool to trace the evolutionary biogenesis of lincRNAs with respect to protein-coding genes by sequence divergence. Subsequently, we show the expression correlation between lincRNAs and their identified parental protein-coding genes using public RNA-seq repositories, hinting at novel gene regulatory relationships. In summary, we developed a novel computational methodology to study non-coding gene sequences, which can be applied to identify the evolutionary biogenesis and function of lincRNAs.
format Online
Article
Text
id pubmed-6158594
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61585942018-10-02 Identification of transcribed protein coding sequence remnants within lincRNAs Talyan, Sweta Andrade-Navarro, Miguel A Muro, Enrique M Nucleic Acids Res Computational Biology Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can trace back their biogenesis from protein-coding genes or posterior transposon/retrotransposon insertions. Applying this method, we found 203 human lincRNA genes with regions significantly similar to protein-coding sequences. Our method provides a visualization tool to trace the evolutionary biogenesis of lincRNAs with respect to protein-coding genes by sequence divergence. Subsequently, we show the expression correlation between lincRNAs and their identified parental protein-coding genes using public RNA-seq repositories, hinting at novel gene regulatory relationships. In summary, we developed a novel computational methodology to study non-coding gene sequences, which can be applied to identify the evolutionary biogenesis and function of lincRNAs. Oxford University Press 2018-09-28 2018-07-09 /pmc/articles/PMC6158594/ /pubmed/29986053 http://dx.doi.org/10.1093/nar/gky608 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Computational Biology
Talyan, Sweta
Andrade-Navarro, Miguel A
Muro, Enrique M
Identification of transcribed protein coding sequence remnants within lincRNAs
title Identification of transcribed protein coding sequence remnants within lincRNAs
title_full Identification of transcribed protein coding sequence remnants within lincRNAs
title_fullStr Identification of transcribed protein coding sequence remnants within lincRNAs
title_full_unstemmed Identification of transcribed protein coding sequence remnants within lincRNAs
title_short Identification of transcribed protein coding sequence remnants within lincRNAs
title_sort identification of transcribed protein coding sequence remnants within lincrnas
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6158594/
https://www.ncbi.nlm.nih.gov/pubmed/29986053
http://dx.doi.org/10.1093/nar/gky608
work_keys_str_mv AT talyansweta identificationoftranscribedproteincodingsequenceremnantswithinlincrnas
AT andradenavarromiguela identificationoftranscribedproteincodingsequenceremnantswithinlincrnas
AT muroenriquem identificationoftranscribedproteincodingsequenceremnantswithinlincrnas