Cargando…

CoRAL: predicting non-coding RNAs from small RNA-sequencing data

The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in...

Descripción completa

Detalles Bibliográficos
Autores principales: Leung, Yuk Yee, Ryvkin, Paul, Ungar, Lyle H., Gregory, Brian D., Wang, Li-San
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3737537/
https://www.ncbi.nlm.nih.gov/pubmed/23700308
http://dx.doi.org/10.1093/nar/gkt426
_version_ 1782279875219947520
author Leung, Yuk Yee
Ryvkin, Paul
Ungar, Lyle H.
Gregory, Brian D.
Wang, Li-San
author_facet Leung, Yuk Yee
Ryvkin, Paul
Ungar, Lyle H.
Gregory, Brian D.
Wang, Li-San
author_sort Leung, Yuk Yee
collection PubMed
description The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.
format Online
Article
Text
id pubmed-3737537
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37375372013-08-08 CoRAL: predicting non-coding RNAs from small RNA-sequencing data Leung, Yuk Yee Ryvkin, Paul Ungar, Lyle H. Gregory, Brian D. Wang, Li-San Nucleic Acids Res Methods Online The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms. Oxford University Press 2013-08 2013-05-21 /pmc/articles/PMC3737537/ /pubmed/23700308 http://dx.doi.org/10.1093/nar/gkt426 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Leung, Yuk Yee
Ryvkin, Paul
Ungar, Lyle H.
Gregory, Brian D.
Wang, Li-San
CoRAL: predicting non-coding RNAs from small RNA-sequencing data
title CoRAL: predicting non-coding RNAs from small RNA-sequencing data
title_full CoRAL: predicting non-coding RNAs from small RNA-sequencing data
title_fullStr CoRAL: predicting non-coding RNAs from small RNA-sequencing data
title_full_unstemmed CoRAL: predicting non-coding RNAs from small RNA-sequencing data
title_short CoRAL: predicting non-coding RNAs from small RNA-sequencing data
title_sort coral: predicting non-coding rnas from small rna-sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3737537/
https://www.ncbi.nlm.nih.gov/pubmed/23700308
http://dx.doi.org/10.1093/nar/gkt426
work_keys_str_mv AT leungyukyee coralpredictingnoncodingrnasfromsmallrnasequencingdata
AT ryvkinpaul coralpredictingnoncodingrnasfromsmallrnasequencingdata
AT ungarlyleh coralpredictingnoncodingrnasfromsmallrnasequencingdata
AT gregorybriand coralpredictingnoncodingrnasfromsmallrnasequencingdata
AT wanglisan coralpredictingnoncodingrnasfromsmallrnasequencingdata