Cargando…

A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data

BACKGROUND: Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the sh...

Descripción completa

Detalles Bibliográficos
Autores principales: Meher, Prabina Kumar, Sahu, Tanmaya Kumar, Rao, Atmakuri Ramakrishna, Wahi, Sant Dass
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702320/
https://www.ncbi.nlm.nih.gov/pubmed/25420551
http://dx.doi.org/10.1186/s12859-014-0362-6
_version_ 1782408617795780608
author Meher, Prabina Kumar
Sahu, Tanmaya Kumar
Rao, Atmakuri Ramakrishna
Wahi, Sant Dass
author_facet Meher, Prabina Kumar
Sahu, Tanmaya Kumar
Rao, Atmakuri Ramakrishna
Wahi, Sant Dass
author_sort Meher, Prabina Kumar
collection PubMed
description BACKGROUND: Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the short sequence reads generated from next generation sequencing technologies. Further, machine learning techniques require numerically encoded data and produce different accuracy with different encoding procedures. Therefore, splice site prediction with short sequence motifs and without encoding sequence data became a motivation for the present study. RESULTS: An approach for finding association among nucleotide bases in the splice site motifs is developed and used further to determine the appropriate window size. Besides, an approach for prediction of donor splice sites using sum of absolute error criterion has also been proposed. The proposed approach has been compared with commonly used approaches i.e., Maximum Entropy Modeling (MEM), Maximal Dependency Decomposition (MDD), Weighted Matrix Method (WMM) and Markov Model of first order (MM1) and was found to perform equally with MEM and MDD and better than WMM and MM1 in terms of prediction accuracy. CONCLUSIONS: The proposed prediction approach can be used in the prediction of donor splice sites with higher accuracy using short sequence motifs and hence can be used as a complementary method to the existing approaches. Based on the proposed methodology, a web server was also developed for easy prediction of donor splice sites by users and is available at http://cabgrid.res.in:8080/sspred. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0362-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4702320
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47023202016-01-07 A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data Meher, Prabina Kumar Sahu, Tanmaya Kumar Rao, Atmakuri Ramakrishna Wahi, Sant Dass BMC Bioinformatics Methodology Article BACKGROUND: Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the short sequence reads generated from next generation sequencing technologies. Further, machine learning techniques require numerically encoded data and produce different accuracy with different encoding procedures. Therefore, splice site prediction with short sequence motifs and without encoding sequence data became a motivation for the present study. RESULTS: An approach for finding association among nucleotide bases in the splice site motifs is developed and used further to determine the appropriate window size. Besides, an approach for prediction of donor splice sites using sum of absolute error criterion has also been proposed. The proposed approach has been compared with commonly used approaches i.e., Maximum Entropy Modeling (MEM), Maximal Dependency Decomposition (MDD), Weighted Matrix Method (WMM) and Markov Model of first order (MM1) and was found to perform equally with MEM and MDD and better than WMM and MM1 in terms of prediction accuracy. CONCLUSIONS: The proposed prediction approach can be used in the prediction of donor splice sites with higher accuracy using short sequence motifs and hence can be used as a complementary method to the existing approaches. Based on the proposed methodology, a web server was also developed for easy prediction of donor splice sites by users and is available at http://cabgrid.res.in:8080/sspred. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0362-6) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-25 /pmc/articles/PMC4702320/ /pubmed/25420551 http://dx.doi.org/10.1186/s12859-014-0362-6 Text en © Meher et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Meher, Prabina Kumar
Sahu, Tanmaya Kumar
Rao, Atmakuri Ramakrishna
Wahi, Sant Dass
A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data
title A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data
title_full A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data
title_fullStr A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data
title_full_unstemmed A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data
title_short A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data
title_sort statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702320/
https://www.ncbi.nlm.nih.gov/pubmed/25420551
http://dx.doi.org/10.1186/s12859-014-0362-6
work_keys_str_mv AT meherprabinakumar astatisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata
AT sahutanmayakumar astatisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata
AT raoatmakuriramakrishna astatisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata
AT wahisantdass astatisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata
AT meherprabinakumar statisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata
AT sahutanmayakumar statisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata
AT raoatmakuriramakrishna statisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata
AT wahisantdass statisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata