Cargando…

Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees

MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predic...

Descripción completa

Detalles Bibliográficos
Autores principales: Williams, Philip H., Eyles, Rod, Weiller, Georg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3503367/
https://www.ncbi.nlm.nih.gov/pubmed/23209882
http://dx.doi.org/10.1155/2012/652979
_version_ 1782250437059018752
author Williams, Philip H.
Eyles, Rod
Weiller, Georg
author_facet Williams, Philip H.
Eyles, Rod
Weiller, Georg
author_sort Williams, Philip H.
collection PubMed
description MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require “read count” to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.
format Online
Article
Text
id pubmed-3503367
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-35033672012-12-03 Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees Williams, Philip H. Eyles, Rod Weiller, Georg J Nucleic Acids Research Article MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require “read count” to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation. Hindawi Publishing Corporation 2012 2012-11-07 /pmc/articles/PMC3503367/ /pubmed/23209882 http://dx.doi.org/10.1155/2012/652979 Text en Copyright © 2012 Philip H. Williams et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Williams, Philip H.
Eyles, Rod
Weiller, Georg
Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_full Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_fullStr Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_full_unstemmed Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_short Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees
title_sort plant microrna prediction by supervised machine learning using c5.0 decision trees
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3503367/
https://www.ncbi.nlm.nih.gov/pubmed/23209882
http://dx.doi.org/10.1155/2012/652979
work_keys_str_mv AT williamsphiliph plantmicrornapredictionbysupervisedmachinelearningusingc50decisiontrees
AT eylesrod plantmicrornapredictionbysupervisedmachinelearningusingc50decisiontrees
AT weillergeorg plantmicrornapredictionbysupervisedmachinelearningusingc50decisiontrees