Cargando…

MicroRNA target prediction using thermodynamic and sequence curves

BACKGROUND: MicroRNAs (miRNAs) are small regulatory RNA that mediate RNA interference by binding to various mRNA target regions. There have been several computational methods for the identification of target mRNAs for miRNAs. However, these have considered all contributory features as scalar represe...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghoshal, Asish, Shankar, Raghavendran, Bagchi, Saurabh, Grama, Ananth, Chaterji, Somali
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658802/
https://www.ncbi.nlm.nih.gov/pubmed/26608597
http://dx.doi.org/10.1186/s12864-015-1933-2
_version_ 1782402572636651520
author Ghoshal, Asish
Shankar, Raghavendran
Bagchi, Saurabh
Grama, Ananth
Chaterji, Somali
author_facet Ghoshal, Asish
Shankar, Raghavendran
Bagchi, Saurabh
Grama, Ananth
Chaterji, Somali
author_sort Ghoshal, Asish
collection PubMed
description BACKGROUND: MicroRNAs (miRNAs) are small regulatory RNA that mediate RNA interference by binding to various mRNA target regions. There have been several computational methods for the identification of target mRNAs for miRNAs. However, these have considered all contributory features as scalar representations, primarily, as thermodynamic or sequence-based features. Further, a majority of these methods solely target canonical sites, which are sites with “seed” complementarity. Here, we present a machine-learning classification scheme, titled Avishkar, which captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves, separately for various input features, such as thermodynamic and sequence features. Further, we use a principled approach to uniformly model canonical and non-canonical seed matches, using a novel seed enrichment metric. RESULTS: We demonstrate that large number of seed-match patterns have high enrichment values, conserved across species, and that majority of miRNA binding sites involve non-canonical matches, corroborating recent findings. Using spatial curves and popular categorical features, such as target site length and location, we train a linear SVM model, utilizing experimental CLIP-seq data. Our model significantly outperforms all established methods, for both canonical and non-canonical sites. We achieve this while using a much larger candidate miRNA-mRNA interaction set than prior work. CONCLUSIONS: We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves, specifically about 20 % better than the state-of-the-art, for different species (human or mouse), or different target types (canonical or non-canonical). To the best of our knowledge we provide the first distributed framework for microRNA target prediction based on Apache Hadoop and Spark. AVAILABILITY: All source code and data is publicly available at https://bitbucket.org/cellsandmachines/avishkar.
format Online
Article
Text
id pubmed-4658802
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46588022015-11-26 MicroRNA target prediction using thermodynamic and sequence curves Ghoshal, Asish Shankar, Raghavendran Bagchi, Saurabh Grama, Ananth Chaterji, Somali BMC Genomics Research Article BACKGROUND: MicroRNAs (miRNAs) are small regulatory RNA that mediate RNA interference by binding to various mRNA target regions. There have been several computational methods for the identification of target mRNAs for miRNAs. However, these have considered all contributory features as scalar representations, primarily, as thermodynamic or sequence-based features. Further, a majority of these methods solely target canonical sites, which are sites with “seed” complementarity. Here, we present a machine-learning classification scheme, titled Avishkar, which captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves, separately for various input features, such as thermodynamic and sequence features. Further, we use a principled approach to uniformly model canonical and non-canonical seed matches, using a novel seed enrichment metric. RESULTS: We demonstrate that large number of seed-match patterns have high enrichment values, conserved across species, and that majority of miRNA binding sites involve non-canonical matches, corroborating recent findings. Using spatial curves and popular categorical features, such as target site length and location, we train a linear SVM model, utilizing experimental CLIP-seq data. Our model significantly outperforms all established methods, for both canonical and non-canonical sites. We achieve this while using a much larger candidate miRNA-mRNA interaction set than prior work. CONCLUSIONS: We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves, specifically about 20 % better than the state-of-the-art, for different species (human or mouse), or different target types (canonical or non-canonical). To the best of our knowledge we provide the first distributed framework for microRNA target prediction based on Apache Hadoop and Spark. AVAILABILITY: All source code and data is publicly available at https://bitbucket.org/cellsandmachines/avishkar. BioMed Central 2015-11-25 /pmc/articles/PMC4658802/ /pubmed/26608597 http://dx.doi.org/10.1186/s12864-015-1933-2 Text en © Ghoshal et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ghoshal, Asish
Shankar, Raghavendran
Bagchi, Saurabh
Grama, Ananth
Chaterji, Somali
MicroRNA target prediction using thermodynamic and sequence curves
title MicroRNA target prediction using thermodynamic and sequence curves
title_full MicroRNA target prediction using thermodynamic and sequence curves
title_fullStr MicroRNA target prediction using thermodynamic and sequence curves
title_full_unstemmed MicroRNA target prediction using thermodynamic and sequence curves
title_short MicroRNA target prediction using thermodynamic and sequence curves
title_sort microrna target prediction using thermodynamic and sequence curves
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4658802/
https://www.ncbi.nlm.nih.gov/pubmed/26608597
http://dx.doi.org/10.1186/s12864-015-1933-2
work_keys_str_mv AT ghoshalasish micrornatargetpredictionusingthermodynamicandsequencecurves
AT shankarraghavendran micrornatargetpredictionusingthermodynamicandsequencecurves
AT bagchisaurabh micrornatargetpredictionusingthermodynamicandsequencecurves
AT gramaananth micrornatargetpredictionusingthermodynamicandsequencecurves
AT chaterjisomali micrornatargetpredictionusingthermodynamicandsequencecurves