Cargando…
Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data
BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448272/ https://www.ncbi.nlm.nih.gov/pubmed/26022464 http://dx.doi.org/10.1186/s12859-015-0594-0 |
_version_ | 1782373684569178112 |
---|---|
author | Higashi, Susan Fournier, Cyril Gautier, Christian Gaspin, Christine Sagot, Marie-France |
author_facet | Higashi, Susan Fournier, Cyril Gautier, Christian Gaspin, Christine Sagot, Marie-France |
author_sort | Higashi, Susan |
collection | PubMed |
description | BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general are identified using RNA folders whose complexity is cubic in the size of the input. The vast majority of pre-miRNA predictors then rely on further information learned from previously validated miRNAs from the same or a closely related genome for the final prediction of new miRNAs. With this paper, we wished to address three main issues. The first was methodological and aimed at obtaining a more time-efficient predictor, however without losing in accuracy which represented a second issue. We indeed aimed at better predicting miRNAs at a genome scale, but also from sRNAseq data where in some cases, notably of plants, the current folding methods often infer the wrong structure. The third issue is related to the fact that it is important to rely as little as possible on previously recorded examples of miRNAs. We therefore also sought a method that is less dependent on previous miRNA records. RESULTS: As concerns the first and second issues, we present a novel alternative to a classical folder based on a thermodynamic Nearest-Neighbour (NN) model for computing the free energy and predicting the classical hairpin structure of a pre-miRNA. We show that the free energies thus computed correlate well with those of RNAfold. This novel method, called Mirinho, has quadratic instead of cubic complexity and is much more efficient also in practice. When applied to sRNAseq data of plants, it gives in general better results than classical folders. On the third issue, we show that Mirinho, which uses as only knowledge the length of the loops and stem-arms and the free energy of the pre-miRNA hairpin, compares well with algorithms that require more information. The results, obtained with different datasets, are indeed similar to those of other approaches with which such a comparison was possible. These needed to be publicly available softwares that could be used on a large input. In some cases, Mirinho is even better in terms of sensitivity or precision. CONCLUSION: We provide a simpler and much faster method with very reasonable sensitivity and precision, which can be applied without special adaptation to the prediction of both animal and plant pre-miRNAs, using as input either genomic sequences or sRNA-seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0594-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4448272 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44482722015-05-30 Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data Higashi, Susan Fournier, Cyril Gautier, Christian Gaspin, Christine Sagot, Marie-France BMC Bioinformatics Methodology Article BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general are identified using RNA folders whose complexity is cubic in the size of the input. The vast majority of pre-miRNA predictors then rely on further information learned from previously validated miRNAs from the same or a closely related genome for the final prediction of new miRNAs. With this paper, we wished to address three main issues. The first was methodological and aimed at obtaining a more time-efficient predictor, however without losing in accuracy which represented a second issue. We indeed aimed at better predicting miRNAs at a genome scale, but also from sRNAseq data where in some cases, notably of plants, the current folding methods often infer the wrong structure. The third issue is related to the fact that it is important to rely as little as possible on previously recorded examples of miRNAs. We therefore also sought a method that is less dependent on previous miRNA records. RESULTS: As concerns the first and second issues, we present a novel alternative to a classical folder based on a thermodynamic Nearest-Neighbour (NN) model for computing the free energy and predicting the classical hairpin structure of a pre-miRNA. We show that the free energies thus computed correlate well with those of RNAfold. This novel method, called Mirinho, has quadratic instead of cubic complexity and is much more efficient also in practice. When applied to sRNAseq data of plants, it gives in general better results than classical folders. On the third issue, we show that Mirinho, which uses as only knowledge the length of the loops and stem-arms and the free energy of the pre-miRNA hairpin, compares well with algorithms that require more information. The results, obtained with different datasets, are indeed similar to those of other approaches with which such a comparison was possible. These needed to be publicly available softwares that could be used on a large input. In some cases, Mirinho is even better in terms of sensitivity or precision. CONCLUSION: We provide a simpler and much faster method with very reasonable sensitivity and precision, which can be applied without special adaptation to the prediction of both animal and plant pre-miRNAs, using as input either genomic sequences or sRNA-seq data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0594-0) contains supplementary material, which is available to authorized users. BioMed Central 2015-05-29 /pmc/articles/PMC4448272/ /pubmed/26022464 http://dx.doi.org/10.1186/s12859-015-0594-0 Text en © Higashi et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Higashi, Susan Fournier, Cyril Gautier, Christian Gaspin, Christine Sagot, Marie-France Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data |
title | Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data |
title_full | Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data |
title_fullStr | Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data |
title_full_unstemmed | Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data |
title_short | Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data |
title_sort | mirinho: an efficient and general plant and animal pre-mirna predictor for genomic and deep sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448272/ https://www.ncbi.nlm.nih.gov/pubmed/26022464 http://dx.doi.org/10.1186/s12859-015-0594-0 |
work_keys_str_mv | AT higashisusan mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata AT fourniercyril mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata AT gautierchristian mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata AT gaspinchristine mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata AT sagotmariefrance mirinhoanefficientandgeneralplantandanimalpremirnapredictorforgenomicanddeepsequencingdata |