Cargando…

Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs

BACKGROUND: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of whic...

Descripción completa

Detalles Bibliográficos
Autores principales: Khachane, Amit N., Harrison, Paul M.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859052/
https://www.ncbi.nlm.nih.gov/pubmed/20428234
http://dx.doi.org/10.1371/journal.pone.0010316
_version_ 1782180475823980544
author Khachane, Amit N.
Harrison, Paul M.
author_facet Khachane, Amit N.
Harrison, Paul M.
author_sort Khachane, Amit N.
collection PubMed
description BACKGROUND: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of ‘transcription noise’. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner. PRINCIPAL FINDINGS: We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented. CONCLUSION: Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms.
format Text
id pubmed-2859052
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28590522010-04-28 Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs Khachane, Amit N. Harrison, Paul M. PLoS One Research Article BACKGROUND: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of ‘transcription noise’. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner. PRINCIPAL FINDINGS: We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented. CONCLUSION: Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms. Public Library of Science 2010-04-23 /pmc/articles/PMC2859052/ /pubmed/20428234 http://dx.doi.org/10.1371/journal.pone.0010316 Text en Khachane, Harrison. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Khachane, Amit N.
Harrison, Paul M.
Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs
title Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs
title_full Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs
title_fullStr Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs
title_full_unstemmed Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs
title_short Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs
title_sort mining mammalian transcript data for functional long non-coding rnas
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859052/
https://www.ncbi.nlm.nih.gov/pubmed/20428234
http://dx.doi.org/10.1371/journal.pone.0010316
work_keys_str_mv AT khachaneamitn miningmammaliantranscriptdataforfunctionallongnoncodingrnas
AT harrisonpaulm miningmammaliantranscriptdataforfunctionallongnoncodingrnas