Cargando…

Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine

BACKGROUND: MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies ha...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Jun, Liu, Dong, Sun, Chao, Luan, Yushi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310204/
https://www.ncbi.nlm.nih.gov/pubmed/25547126
http://dx.doi.org/10.1186/s12859-014-0423-x
_version_ 1782354829807452160
author Meng, Jun
Liu, Dong
Sun, Chao
Luan, Yushi
author_facet Meng, Jun
Liu, Dong
Sun, Chao
Luan, Yushi
author_sort Meng, Jun
collection PubMed
description BACKGROUND: MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs. RESULTS: A novel classification model based on a support vector machine (SVM) was trained to identify real and pseudo plant pre-miRNAs together with their miRNAs. An initial set of 152 novel features related to sequential structures was used to train the model. By applying feature selection, we obtained the best subset of 47 features for use with the Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) method for the classification of plant pre-miRNAs. Using this method, 63 features were obtained for plant miRNA classification. We then developed an integrated classification model, miPlantPreMat, which comprises MiPlantPre and MiPlantMat, to identify plant pre-miRNAs and their miRNAs. This model achieved approximately 90% accuracy using plant datasets from nine plant species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Arabidopsis lyrata, Zea mays and Solanum lycopersicum. Using miPlantPreMat, 522 Solanum lycopersicum miRNAs were identified in the Solanum lycopersicum genome sequence. CONCLUSIONS: We developed an integrated classification model, miPlantPreMat, based on structure-sequence features and SVM. MiPlantPreMat was used to identify both plant pre-miRNAs and the corresponding mature miRNAs. An improved feature selection method was proposed, resulting in high classification accuracy, sensitivity and specificity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0423-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4310204
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43102042015-02-03 Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine Meng, Jun Liu, Dong Sun, Chao Luan, Yushi BMC Bioinformatics Methodology Article BACKGROUND: MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs. RESULTS: A novel classification model based on a support vector machine (SVM) was trained to identify real and pseudo plant pre-miRNAs together with their miRNAs. An initial set of 152 novel features related to sequential structures was used to train the model. By applying feature selection, we obtained the best subset of 47 features for use with the Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) method for the classification of plant pre-miRNAs. Using this method, 63 features were obtained for plant miRNA classification. We then developed an integrated classification model, miPlantPreMat, which comprises MiPlantPre and MiPlantMat, to identify plant pre-miRNAs and their miRNAs. This model achieved approximately 90% accuracy using plant datasets from nine plant species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Arabidopsis lyrata, Zea mays and Solanum lycopersicum. Using miPlantPreMat, 522 Solanum lycopersicum miRNAs were identified in the Solanum lycopersicum genome sequence. CONCLUSIONS: We developed an integrated classification model, miPlantPreMat, based on structure-sequence features and SVM. MiPlantPreMat was used to identify both plant pre-miRNAs and the corresponding mature miRNAs. An improved feature selection method was proposed, resulting in high classification accuracy, sensitivity and specificity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0423-x) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-30 /pmc/articles/PMC4310204/ /pubmed/25547126 http://dx.doi.org/10.1186/s12859-014-0423-x Text en © Meng et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Meng, Jun
Liu, Dong
Sun, Chao
Luan, Yushi
Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine
title Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine
title_full Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine
title_fullStr Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine
title_full_unstemmed Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine
title_short Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine
title_sort prediction of plant pre-micrornas and their micrornas in genome-scale sequences using structure-sequence features and support vector machine
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310204/
https://www.ncbi.nlm.nih.gov/pubmed/25547126
http://dx.doi.org/10.1186/s12859-014-0423-x
work_keys_str_mv AT mengjun predictionofplantpremicrornasandtheirmicrornasingenomescalesequencesusingstructuresequencefeaturesandsupportvectormachine
AT liudong predictionofplantpremicrornasandtheirmicrornasingenomescalesequencesusingstructuresequencefeaturesandsupportvectormachine
AT sunchao predictionofplantpremicrornasandtheirmicrornasingenomescalesequencesusingstructuresequencefeaturesandsupportvectormachine
AT luanyushi predictionofplantpremicrornasandtheirmicrornasingenomescalesequencesusingstructuresequencefeaturesandsupportvectormachine