Cargando…

Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data

BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general...

Descripción completa

Detalles Bibliográficos
Autores principales: Higashi, Susan, Fournier, Cyril, Gautier, Christian, Gaspin, Christine, Sagot, Marie-France
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448272/
https://www.ncbi.nlm.nih.gov/pubmed/26022464
http://dx.doi.org/10.1186/s12859-015-0594-0
_version_ 1782373684569178112
author Higashi, Susan
Fournier, Cyril
Gautier, Christian
Gaspin, Christine
Sagot, Marie-France
author_facet Higashi, Susan
Fournier, Cyril
Gautier, Christian
Gaspin, Christine
Sagot, Marie-France
author_sort Higashi, Susan
collection PubMed
description BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general are identified using RNA folders whose complexity is cubic in the size of the input. The vast majority of pre-miRNA predictors then rely on further information learned from previously validated miRNAs from the same or a closely related genome for the final prediction of new miRNAs. With this paper, we wished to address three main issues. The first was methodological and aimed at obtaining a more time-efficient predictor, however without losing in accuracy which represented a second issue. We indeed aimed at better predicting miRNAs at a genome scale, but also from sRNAseq data where in some cases, notably of plants, the current folding methods often infer the wrong structure. The third issue is related to the fact that it is important to rely as little as possible on previously recorded examples of miRNAs. We therefore also sought a method that is less dependent on previous miRNA records. RESULTS: As concerns the first and second issues, we present a novel alternative to a classical folder based on a thermodynamic Nearest-Neighbour (NN) model for computing the free energy and predicting the classical hairpin structure of a pre-miRNA. We show that the free energies thus computed correlate well with those of RNAfold. This novel method, called Mirinho, has quadratic instead of cubic complexity and is much more efficient also in practice. When applied to sRNAseq data of plants, it gives in general better results than classical folders. On the third issue, we show that Mirinho, which uses as only knowledge the length of the loops and stem-arms and the free energy of the pre-miRNA hairpin, compares well with algorithms that require more information. The results, obtained with different datasets, are indeed similar to those of other approaches with which such a comparison was possible. These needed to be publicly available softwares that could be used on a large input. In some cases, Mirinho is even better in terms of sensitivity or precision. CONCLUSION: We provide a simpler and much faster method with very reasonable sensitivity and precision, which can be applied without special adaptation to the prediction of both animal and plant pre-miRNAs, using as input either genomic sequences or sRNA-seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0594-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4448272
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44482722015-05-30 Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data Higashi, Susan Fournier, Cyril Gautier, Christian Gaspin, Christine Sagot, Marie-France BMC Bioinformatics Methodology Article BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general are identified using RNA folders whose complexity is cubic in the size of the input. The vast majority of pre-miRNA predictors then rely on further information learned from previously validated miRNAs from the same or a closely related genome for the final prediction of new miRNAs. With this paper, we wished to address three main issues. The first was methodological and aimed at obtaining a more time-efficient predictor, however without losing in accuracy which represented a second issue. We indeed aimed at better predicting miRNAs at a genome scale, but also from sRNAseq data where in some cases, notably of plants, the current folding methods often infer the wrong structure. The third issue is related to the fact that it is important to rely as little as possible on previously recorded examples of miRNAs. We therefore also sought a method that is less dependent on previous miRNA records. RESULTS: As concerns the first and second issues, we present a novel alternative to a classical folder based on a thermodynamic Nearest-Neighbour (NN) model for computing the free energy and predicting the classical hairpin structure of a pre-miRNA. We show that the free energies thus computed correlate well with those of RNAfold. This novel method, called Mirinho, has quadratic instead of cubic complexity and is much more efficient also in practice. When applied to sRNAseq data of plants, it gives in general better results than classical folders. On the third issue, we show that Mirinho, which uses as only knowledge the length of the loops and stem-arms and the free energy of the pre-miRNA hairpin, compares well with algorithms that require more information. The results, obtained with different datasets, are indeed similar to those of other approaches with which such a comparison was possible. These needed to be publicly available softwares that could be used on a large input. In some cases, Mirinho is even better in terms of sensitivity or precision. CONCLUSION: We provide a simpler and much faster method with very reasonable sensitivity and precision, which can be applied without special adaptation to the prediction of both animal and plant pre-miRNAs, using as input either genomic sequences or sRNA-seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0594-0) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-29 /pmc/articles/PMC4448272/ /pubmed/26022464 http://dx.doi.org/10.1186/s12859-015-0594-0 Text en © Higashi et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Higashi, Susan
Fournier, Cyril
Gautier, Christian
Gaspin, Christine
Sagot, Marie-France
Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data
title Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data
title_full Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data
title_fullStr Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data
title_full_unstemmed Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data
title_short Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data
title_sort mirinho: an efficient and general plant and animal pre-mirna predictor for genomic and deep sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448272/
https://www.ncbi.nlm.nih.gov/pubmed/26022464
http://dx.doi.org/10.1186/s12859-015-0594-0
work_keys_str_mv AT higashisusan mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata
AT fourniercyril mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata
AT gautierchristian mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata
AT gaspinchristine mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata
AT sagotmariefrance mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata