Cargando…
In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity
BACKGROUND: MicroRNAs (miRNAs), short ~21-nucleotide RNA molecules, play an important role in post-transcriptional regulation of gene expression. The number of known miRNA hairpins registered in the miRBase database is rapidly increasing, but recent reports suggest that many miRNAs with restricted t...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2688010/ https://www.ncbi.nlm.nih.gov/pubmed/19405940 http://dx.doi.org/10.1186/1471-2164-10-204 |
_version_ | 1782167639485841408 |
---|---|
author | van der Burgt, Ate Fiers, Mark WJE Nap, Jan-Peter van Ham, Roeland CHJ |
author_facet | van der Burgt, Ate Fiers, Mark WJE Nap, Jan-Peter van Ham, Roeland CHJ |
author_sort | van der Burgt, Ate |
collection | PubMed |
description | BACKGROUND: MicroRNAs (miRNAs), short ~21-nucleotide RNA molecules, play an important role in post-transcriptional regulation of gene expression. The number of known miRNA hairpins registered in the miRBase database is rapidly increasing, but recent reports suggest that many miRNAs with restricted temporal or tissue-specific expression remain undiscovered. Various strategies for in silico miRNA identification have been proposed to facilitate miRNA discovery. Notably support vector machine (SVM) methods have recently gained popularity. However, a drawback of these methods is that they do not provide insight into the biological properties of miRNA sequences. RESULTS: We here propose a new strategy for miRNA hairpin prediction in which the likelihood that a genomic hairpin is a true miRNA hairpin is evaluated based on statistical distributions of observed biological variation of properties (descriptors) of known miRNA hairpins. These distributions are transformed into a single and continuous outcome classifier called the L score. Using a dataset of known miRNA hairpins from the miRBase database and an exhaustive set of genomic hairpins identified in the genome of Caenorhabditis elegans, a subset of 18 most informative descriptors was selected after detailed analysis of correlation among and discriminative power of individual descriptors. We show that the majority of previously identified miRNA hairpins have high L scores, that the method outperforms miRNA prediction by threshold filtering and that it is more transparent than SVM classifiers. CONCLUSION: The L score is applicable as a prediction classifier with high sensitivity for novel miRNA hairpins. The L-score approach can be used to rank and select interesting miRNA hairpin candidates for downstream experimental analysis when coupled to a genome-wide set of in silico-identified hairpins or to facilitate the analysis of large sets of putative miRNA hairpin loci obtained in deep-sequencing efforts of small RNAs. Moreover, the in-depth analyses of miRNA hairpins descriptors preceding and determining the L score outcome could be used as an extension to miRBase entries to help increase the reliability and biological relevance of the miRNA registry. |
format | Text |
id | pubmed-2688010 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26880102009-05-29 In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity van der Burgt, Ate Fiers, Mark WJE Nap, Jan-Peter van Ham, Roeland CHJ BMC Genomics Research Article BACKGROUND: MicroRNAs (miRNAs), short ~21-nucleotide RNA molecules, play an important role in post-transcriptional regulation of gene expression. The number of known miRNA hairpins registered in the miRBase database is rapidly increasing, but recent reports suggest that many miRNAs with restricted temporal or tissue-specific expression remain undiscovered. Various strategies for in silico miRNA identification have been proposed to facilitate miRNA discovery. Notably support vector machine (SVM) methods have recently gained popularity. However, a drawback of these methods is that they do not provide insight into the biological properties of miRNA sequences. RESULTS: We here propose a new strategy for miRNA hairpin prediction in which the likelihood that a genomic hairpin is a true miRNA hairpin is evaluated based on statistical distributions of observed biological variation of properties (descriptors) of known miRNA hairpins. These distributions are transformed into a single and continuous outcome classifier called the L score. Using a dataset of known miRNA hairpins from the miRBase database and an exhaustive set of genomic hairpins identified in the genome of Caenorhabditis elegans, a subset of 18 most informative descriptors was selected after detailed analysis of correlation among and discriminative power of individual descriptors. We show that the majority of previously identified miRNA hairpins have high L scores, that the method outperforms miRNA prediction by threshold filtering and that it is more transparent than SVM classifiers. CONCLUSION: The L score is applicable as a prediction classifier with high sensitivity for novel miRNA hairpins. The L-score approach can be used to rank and select interesting miRNA hairpin candidates for downstream experimental analysis when coupled to a genome-wide set of in silico-identified hairpins or to facilitate the analysis of large sets of putative miRNA hairpin loci obtained in deep-sequencing efforts of small RNAs. Moreover, the in-depth analyses of miRNA hairpins descriptors preceding and determining the L score outcome could be used as an extension to miRBase entries to help increase the reliability and biological relevance of the miRNA registry. BioMed Central 2009-04-30 /pmc/articles/PMC2688010/ /pubmed/19405940 http://dx.doi.org/10.1186/1471-2164-10-204 Text en Copyright © 2009 van der Burgt et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article van der Burgt, Ate Fiers, Mark WJE Nap, Jan-Peter van Ham, Roeland CHJ In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity |
title | In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity |
title_full | In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity |
title_fullStr | In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity |
title_full_unstemmed | In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity |
title_short | In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity |
title_sort | in silico mirna prediction in metazoan genomes: balancing between sensitivity and specificity |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2688010/ https://www.ncbi.nlm.nih.gov/pubmed/19405940 http://dx.doi.org/10.1186/1471-2164-10-204 |
work_keys_str_mv | AT vanderburgtate insilicomirnapredictioninmetazoangenomesbalancingbetweensensitivityandspecificity AT fiersmarkwje insilicomirnapredictioninmetazoangenomesbalancingbetweensensitivityandspecificity AT napjanpeter insilicomirnapredictioninmetazoangenomesbalancingbetweensensitivityandspecificity AT vanhamroelandchj insilicomirnapredictioninmetazoangenomesbalancingbetweensensitivityandspecificity |