Cargando…

Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine

One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alterna...

Descripción completa

Detalles Bibliográficos
Autores principales: Mao, Rui, Raj Kumar, Praveen Kumar, Guo, Cheng, Zhang, Yang, Liang, Chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4128822/
https://www.ncbi.nlm.nih.gov/pubmed/25110928
http://dx.doi.org/10.1371/journal.pone.0104049
_version_ 1782330180529815552
author Mao, Rui
Raj Kumar, Praveen Kumar
Guo, Cheng
Zhang, Yang
Liang, Chun
author_facet Mao, Rui
Raj Kumar, Praveen Kumar
Guo, Cheng
Zhang, Yang
Liang, Chun
author_sort Mao, Rui
collection PubMed
description One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter [Image: see text] in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
format Online
Article
Text
id pubmed-4128822
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41288222014-08-12 Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine Mao, Rui Raj Kumar, Praveen Kumar Guo, Cheng Zhang, Yang Liang, Chun PLoS One Research Article One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter [Image: see text] in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention. Public Library of Science 2014-08-11 /pmc/articles/PMC4128822/ /pubmed/25110928 http://dx.doi.org/10.1371/journal.pone.0104049 Text en © 2014 Mao et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Mao, Rui
Raj Kumar, Praveen Kumar
Guo, Cheng
Zhang, Yang
Liang, Chun
Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine
title Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine
title_full Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine
title_fullStr Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine
title_full_unstemmed Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine
title_short Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine
title_sort comparative analyses between retained introns and constitutively spliced introns in arabidopsis thaliana using random forest and support vector machine
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4128822/
https://www.ncbi.nlm.nih.gov/pubmed/25110928
http://dx.doi.org/10.1371/journal.pone.0104049
work_keys_str_mv AT maorui comparativeanalysesbetweenretainedintronsandconstitutivelysplicedintronsinarabidopsisthalianausingrandomforestandsupportvectormachine
AT rajkumarpraveenkumar comparativeanalysesbetweenretainedintronsandconstitutivelysplicedintronsinarabidopsisthalianausingrandomforestandsupportvectormachine
AT guocheng comparativeanalysesbetweenretainedintronsandconstitutivelysplicedintronsinarabidopsisthalianausingrandomforestandsupportvectormachine
AT zhangyang comparativeanalysesbetweenretainedintronsandconstitutivelysplicedintronsinarabidopsisthalianausingrandomforestandsupportvectormachine
AT liangchun comparativeanalysesbetweenretainedintronsandconstitutivelysplicedintronsinarabidopsisthalianausingrandomforestandsupportvectormachine