Cargando…
A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data
BACKGROUND: Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the sh...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702320/ https://www.ncbi.nlm.nih.gov/pubmed/25420551 http://dx.doi.org/10.1186/s12859-014-0362-6 |
_version_ | 1782408617795780608 |
---|---|
author | Meher, Prabina Kumar Sahu, Tanmaya Kumar Rao, Atmakuri Ramakrishna Wahi, Sant Dass |
author_facet | Meher, Prabina Kumar Sahu, Tanmaya Kumar Rao, Atmakuri Ramakrishna Wahi, Sant Dass |
author_sort | Meher, Prabina Kumar |
collection | PubMed |
description | BACKGROUND: Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the short sequence reads generated from next generation sequencing technologies. Further, machine learning techniques require numerically encoded data and produce different accuracy with different encoding procedures. Therefore, splice site prediction with short sequence motifs and without encoding sequence data became a motivation for the present study. RESULTS: An approach for finding association among nucleotide bases in the splice site motifs is developed and used further to determine the appropriate window size. Besides, an approach for prediction of donor splice sites using sum of absolute error criterion has also been proposed. The proposed approach has been compared with commonly used approaches i.e., Maximum Entropy Modeling (MEM), Maximal Dependency Decomposition (MDD), Weighted Matrix Method (WMM) and Markov Model of first order (MM1) and was found to perform equally with MEM and MDD and better than WMM and MM1 in terms of prediction accuracy. CONCLUSIONS: The proposed prediction approach can be used in the prediction of donor splice sites with higher accuracy using short sequence motifs and hence can be used as a complementary method to the existing approaches. Based on the proposed methodology, a web server was also developed for easy prediction of donor splice sites by users and is available at http://cabgrid.res.in:8080/sspred. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0362-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4702320 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47023202016-01-07 A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data Meher, Prabina Kumar Sahu, Tanmaya Kumar Rao, Atmakuri Ramakrishna Wahi, Sant Dass BMC Bioinformatics Methodology Article BACKGROUND: Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the short sequence reads generated from next generation sequencing technologies. Further, machine learning techniques require numerically encoded data and produce different accuracy with different encoding procedures. Therefore, splice site prediction with short sequence motifs and without encoding sequence data became a motivation for the present study. RESULTS: An approach for finding association among nucleotide bases in the splice site motifs is developed and used further to determine the appropriate window size. Besides, an approach for prediction of donor splice sites using sum of absolute error criterion has also been proposed. The proposed approach has been compared with commonly used approaches i.e., Maximum Entropy Modeling (MEM), Maximal Dependency Decomposition (MDD), Weighted Matrix Method (WMM) and Markov Model of first order (MM1) and was found to perform equally with MEM and MDD and better than WMM and MM1 in terms of prediction accuracy. CONCLUSIONS: The proposed prediction approach can be used in the prediction of donor splice sites with higher accuracy using short sequence motifs and hence can be used as a complementary method to the existing approaches. Based on the proposed methodology, a web server was also developed for easy prediction of donor splice sites by users and is available at http://cabgrid.res.in:8080/sspred. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0362-6) contains supplementary material, which is available to authorized users. BioMed Central 2014-11-25 /pmc/articles/PMC4702320/ /pubmed/25420551 http://dx.doi.org/10.1186/s12859-014-0362-6 Text en © Meher et al.; licensee BioMed Central. 2014 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Meher, Prabina Kumar Sahu, Tanmaya Kumar Rao, Atmakuri Ramakrishna Wahi, Sant Dass A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data |
title | A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data |
title_full | A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data |
title_fullStr | A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data |
title_full_unstemmed | A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data |
title_short | A statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data |
title_sort | statistical approach for 5′ splice site prediction using short sequence motifs and without encoding sequence data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702320/ https://www.ncbi.nlm.nih.gov/pubmed/25420551 http://dx.doi.org/10.1186/s12859-014-0362-6 |
work_keys_str_mv | AT meherprabinakumar astatisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata AT sahutanmayakumar astatisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata AT raoatmakuriramakrishna astatisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata AT wahisantdass astatisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata AT meherprabinakumar statisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata AT sahutanmayakumar statisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata AT raoatmakuriramakrishna statisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata AT wahisantdass statisticalapproachfor5splicesitepredictionusingshortsequencemotifsandwithoutencodingsequencedata |