Cargando…

Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features

BACKGROUND: RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) appr...

Descripción completa

Detalles Bibliográficos
Autor principal: Peek, Andrew S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1906837/
https://www.ncbi.nlm.nih.gov/pubmed/17553157
http://dx.doi.org/10.1186/1471-2105-8-182
_version_ 1782134039208001536
author Peek, Andrew S
author_facet Peek, Andrew S
author_sort Peek, Andrew S
collection PubMed
description BACKGROUND: RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities. RESULTS: Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (N-grams) and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative. CONCLUSION: The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall t-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid sequences can be found at the following site: .
format Text
id pubmed-1906837
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19068372007-07-04 Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features Peek, Andrew S BMC Bioinformatics Research Article BACKGROUND: RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities. RESULTS: Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (N-grams) and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative. CONCLUSION: The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall t-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid sequences can be found at the following site: . BioMed Central 2007-06-06 /pmc/articles/PMC1906837/ /pubmed/17553157 http://dx.doi.org/10.1186/1471-2105-8-182 Text en Copyright © 2007 Peek; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Peek, Andrew S
Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features
title Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features
title_full Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features
title_fullStr Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features
title_full_unstemmed Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features
title_short Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features
title_sort improving model predictions for rna interference activities that use support vector machine regression by combining and filtering features
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1906837/
https://www.ncbi.nlm.nih.gov/pubmed/17553157
http://dx.doi.org/10.1186/1471-2105-8-182
work_keys_str_mv AT peekandrews improvingmodelpredictionsforrnainterferenceactivitiesthatusesupportvectormachineregressionbycombiningandfilteringfeatures