Cargando…
Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs
BACKGROUND: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of whic...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859052/ https://www.ncbi.nlm.nih.gov/pubmed/20428234 http://dx.doi.org/10.1371/journal.pone.0010316 |
_version_ | 1782180475823980544 |
---|---|
author | Khachane, Amit N. Harrison, Paul M. |
author_facet | Khachane, Amit N. Harrison, Paul M. |
author_sort | Khachane, Amit N. |
collection | PubMed |
description | BACKGROUND: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of ‘transcription noise’. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner. PRINCIPAL FINDINGS: We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented. CONCLUSION: Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms. |
format | Text |
id | pubmed-2859052 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-28590522010-04-28 Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs Khachane, Amit N. Harrison, Paul M. PLoS One Research Article BACKGROUND: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of ‘transcription noise’. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner. PRINCIPAL FINDINGS: We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented. CONCLUSION: Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms. Public Library of Science 2010-04-23 /pmc/articles/PMC2859052/ /pubmed/20428234 http://dx.doi.org/10.1371/journal.pone.0010316 Text en Khachane, Harrison. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Khachane, Amit N. Harrison, Paul M. Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs |
title | Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs |
title_full | Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs |
title_fullStr | Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs |
title_full_unstemmed | Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs |
title_short | Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs |
title_sort | mining mammalian transcript data for functional long non-coding rnas |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859052/ https://www.ncbi.nlm.nih.gov/pubmed/20428234 http://dx.doi.org/10.1371/journal.pone.0010316 |
work_keys_str_mv | AT khachaneamitn miningmammaliantranscriptdataforfunctionallongnoncodingrnas AT harrisonpaulm miningmammaliantranscriptdataforfunctionallongnoncodingrnas |