Cargando…
Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs
To date, the main criterion by which long ncRNAs (lncRNAs) are discriminated from mRNAs is based on the capacity of the transcripts to encode a protein. However, it becomes important to identify non-ORF-based sequence characteristics that can be used to parse between ncRNAs and mRNAs. In this study,...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158813/ https://www.ncbi.nlm.nih.gov/pubmed/25250049 http://dx.doi.org/10.3389/fgene.2014.00316 |
_version_ | 1782334124507267072 |
---|---|
author | Ulveling, Damien Dinger, Marcel E. Francastel, Claire Hubé, Florent |
author_facet | Ulveling, Damien Dinger, Marcel E. Francastel, Claire Hubé, Florent |
author_sort | Ulveling, Damien |
collection | PubMed |
description | To date, the main criterion by which long ncRNAs (lncRNAs) are discriminated from mRNAs is based on the capacity of the transcripts to encode a protein. However, it becomes important to identify non-ORF-based sequence characteristics that can be used to parse between ncRNAs and mRNAs. In this study, we first established an extremely selective workflow to define a highly refined database of lncRNAs which was used for comparison with mRNAs. Then using this highly selective collection of lncRNAs, we found the CG dinucleotide frequencies were clearly distinct. In addition, we showed that the bias in CG dinucleotide frequency was conserved in human and mouse genomes. We propose that this sequence feature will serve as a useful classifier in transcript classification pipelines. We also suggest that our refined database of “bona fide” lncRNAs will be valuable for the discovery of other sequence characteristics distinct to lncRNAs. |
format | Online Article Text |
id | pubmed-4158813 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-41588132014-09-23 Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs Ulveling, Damien Dinger, Marcel E. Francastel, Claire Hubé, Florent Front Genet Genetics To date, the main criterion by which long ncRNAs (lncRNAs) are discriminated from mRNAs is based on the capacity of the transcripts to encode a protein. However, it becomes important to identify non-ORF-based sequence characteristics that can be used to parse between ncRNAs and mRNAs. In this study, we first established an extremely selective workflow to define a highly refined database of lncRNAs which was used for comparison with mRNAs. Then using this highly selective collection of lncRNAs, we found the CG dinucleotide frequencies were clearly distinct. In addition, we showed that the bias in CG dinucleotide frequency was conserved in human and mouse genomes. We propose that this sequence feature will serve as a useful classifier in transcript classification pipelines. We also suggest that our refined database of “bona fide” lncRNAs will be valuable for the discovery of other sequence characteristics distinct to lncRNAs. Frontiers Media S.A. 2014-09-09 /pmc/articles/PMC4158813/ /pubmed/25250049 http://dx.doi.org/10.3389/fgene.2014.00316 Text en Copyright © 2014 Ulveling, Dinger, Francastel and Hubé. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Ulveling, Damien Dinger, Marcel E. Francastel, Claire Hubé, Florent Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs |
title | Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs |
title_full | Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs |
title_fullStr | Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs |
title_full_unstemmed | Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs |
title_short | Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs |
title_sort | identification of a dinucleotide signature that discriminates coding from non-coding long rnas |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4158813/ https://www.ncbi.nlm.nih.gov/pubmed/25250049 http://dx.doi.org/10.3389/fgene.2014.00316 |
work_keys_str_mv | AT ulvelingdamien identificationofadinucleotidesignaturethatdiscriminatescodingfromnoncodinglongrnas AT dingermarcele identificationofadinucleotidesignaturethatdiscriminatescodingfromnoncodinglongrnas AT francastelclaire identificationofadinucleotidesignaturethatdiscriminatescodingfromnoncodinglongrnas AT hubeflorent identificationofadinucleotidesignaturethatdiscriminatescodingfromnoncodinglongrnas |