Cargando…

Ab initio identification of human microRNAs based on structure motifs

BACKGROUND: MicroRNAs (miRNAs) are short, non-coding RNA molecules that are directly involved in post-transcriptional regulation of gene expression. The mature miRNA sequence binds to more or less specific target sites on the mRNA. Both their small size and sequence specificity make the detection of...

Descripción completa

Detalles Bibliográficos
Autores principales: Brameier, Markus, Wiuf, Carsten
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238772/
https://www.ncbi.nlm.nih.gov/pubmed/18088431
http://dx.doi.org/10.1186/1471-2105-8-478
_version_ 1782150461006020608
author Brameier, Markus
Wiuf, Carsten
author_facet Brameier, Markus
Wiuf, Carsten
author_sort Brameier, Markus
collection PubMed
description BACKGROUND: MicroRNAs (miRNAs) are short, non-coding RNA molecules that are directly involved in post-transcriptional regulation of gene expression. The mature miRNA sequence binds to more or less specific target sites on the mRNA. Both their small size and sequence specificity make the detection of completely new miRNAs a challenging task. This cannot be based on sequence information alone, but requires structure information about the miRNA precursor. Unlike comparative genomics approaches, ab initio approaches are able to discover species-specific miRNAs without known sequence homology. RESULTS: MiRPred is a novel method for ab initio prediction of miRNAs by genome scanning that only relies on (predicted) secondary structure to distinguish miRNA precursors from other similar-sized segments of the human genome. We apply a machine learning technique, called linear genetic programming, to develop special classifier programs which include multiple regular expressions (motifs) matched against the secondary structure sequence. Special attention is paid to scanning issues. The classifiers are trained on fixed-length sequences as these occur when shifting a window in regular steps over a genome region. Various statistical and empirical evidence is collected to validate the correctness of and increase confidence in the predicted structures. Among other things, we propose a new criterion to select miRNA candidates with a higher stability of folding that is based on the number of matching windows around their genome location. An ensemble of 16 motif-based classifiers achieves 99.9 percent specificity with sensitivity remaining on an acceptable high level when requiring all classifiers to agree on a positive decision. A low false positive rate is considered more important than a low false negative rate, when searching larger genome regions for unknown miRNAs. 117 new miRNAs have been predicted close to known miRNAs on human chromosome 19. All candidate structures match the free energy distribution of miRNA precursors which is significantly shifted towards lower free energies. We employed a human EST library and found that around 75 percent of the candidate sequences are likely to be transcribed, with around 35 percent located in introns. CONCLUSION: Our motif finding method is at least competitive to state-of-the-art feature-based methods for ab initio miRNA discovery. In doing so, it requires less previous knowledge about miRNA precursor structures while programs and motifs allow a more straightforward interpretation and extraction of the acquired knowledge.
format Text
id pubmed-2238772
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22387722008-02-12 Ab initio identification of human microRNAs based on structure motifs Brameier, Markus Wiuf, Carsten BMC Bioinformatics Methodology Article BACKGROUND: MicroRNAs (miRNAs) are short, non-coding RNA molecules that are directly involved in post-transcriptional regulation of gene expression. The mature miRNA sequence binds to more or less specific target sites on the mRNA. Both their small size and sequence specificity make the detection of completely new miRNAs a challenging task. This cannot be based on sequence information alone, but requires structure information about the miRNA precursor. Unlike comparative genomics approaches, ab initio approaches are able to discover species-specific miRNAs without known sequence homology. RESULTS: MiRPred is a novel method for ab initio prediction of miRNAs by genome scanning that only relies on (predicted) secondary structure to distinguish miRNA precursors from other similar-sized segments of the human genome. We apply a machine learning technique, called linear genetic programming, to develop special classifier programs which include multiple regular expressions (motifs) matched against the secondary structure sequence. Special attention is paid to scanning issues. The classifiers are trained on fixed-length sequences as these occur when shifting a window in regular steps over a genome region. Various statistical and empirical evidence is collected to validate the correctness of and increase confidence in the predicted structures. Among other things, we propose a new criterion to select miRNA candidates with a higher stability of folding that is based on the number of matching windows around their genome location. An ensemble of 16 motif-based classifiers achieves 99.9 percent specificity with sensitivity remaining on an acceptable high level when requiring all classifiers to agree on a positive decision. A low false positive rate is considered more important than a low false negative rate, when searching larger genome regions for unknown miRNAs. 117 new miRNAs have been predicted close to known miRNAs on human chromosome 19. All candidate structures match the free energy distribution of miRNA precursors which is significantly shifted towards lower free energies. We employed a human EST library and found that around 75 percent of the candidate sequences are likely to be transcribed, with around 35 percent located in introns. CONCLUSION: Our motif finding method is at least competitive to state-of-the-art feature-based methods for ab initio miRNA discovery. In doing so, it requires less previous knowledge about miRNA precursor structures while programs and motifs allow a more straightforward interpretation and extraction of the acquired knowledge. BioMed Central 2007-12-18 /pmc/articles/PMC2238772/ /pubmed/18088431 http://dx.doi.org/10.1186/1471-2105-8-478 Text en Copyright © 2007 Brameier and Wiuf; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Brameier, Markus
Wiuf, Carsten
Ab initio identification of human microRNAs based on structure motifs
title Ab initio identification of human microRNAs based on structure motifs
title_full Ab initio identification of human microRNAs based on structure motifs
title_fullStr Ab initio identification of human microRNAs based on structure motifs
title_full_unstemmed Ab initio identification of human microRNAs based on structure motifs
title_short Ab initio identification of human microRNAs based on structure motifs
title_sort ab initio identification of human micrornas based on structure motifs
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238772/
https://www.ncbi.nlm.nih.gov/pubmed/18088431
http://dx.doi.org/10.1186/1471-2105-8-478
work_keys_str_mv AT brameiermarkus abinitioidentificationofhumanmicrornasbasedonstructuremotifs
AT wiufcarsten abinitioidentificationofhumanmicrornasbasedonstructuremotifs