Cargando…

Splice site identification using probabilistic parameters and SVM classification

BACKGROUND: Recent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumulated data is an important issue in bioinformatics, and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baten, AKMA, Chang, BCH, Halgamuge, SK, Li, Jason
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764471/ https://www.ncbi.nlm.nih.gov/pubmed/17254299 http://dx.doi.org/10.1186/1471-2105-7-S5-S15

_version_	1782131617934868480
author	Baten, AKMA Chang, BCH Halgamuge, SK Li, Jason
author_facet	Baten, AKMA Chang, BCH Halgamuge, SK Li, Jason
author_sort	Baten, AKMA
collection	PubMed
description	BACKGROUND: Recent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumulated data is an important issue in bioinformatics, and it requires the prediction of the complete gene structure. Accurate identification of splice sites in DNA sequences plays one of the central roles of gene structural prediction in eukaryotes. Effective detection of splice sites requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the splice site surrounding region. A higher-order Markov model is generally regarded as a useful technique for modeling higher-order dependencies. However, their implementation requires estimating a large number of parameters, which is computationally expensive. RESULTS: The proposed method for splice site detection consists of two stages: a first order Markov model (MM1) is used in the first stage and a support vector machine (SVM) with polynomial kernel is used in the second stage. The MM1 serves as a pre-processing step for the SVM and takes DNA sequences as its input. It models the compositional features and dependencies of nucleotides in terms of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them nonlinearly to predict splice sites. When the proposed MM1-SVM model is compared with other existing standard splice site detection methods, it shows a superior performance in all the cases. CONCLUSION: We proposed an effective pre-processing scheme for the SVM and applied it for the identification of splice sites. This is a simple yet effective splice site detection method, which shows a better classification accuracy and computational speed than some other more complex methods.
format	Text
id	pubmed-1764471
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-17644712007-01-09 Splice site identification using probabilistic parameters and SVM classification Baten, AKMA Chang, BCH Halgamuge, SK Li, Jason BMC Bioinformatics Proceedings BACKGROUND: Recent advances and automation in DNA sequencing technology has created a vast amount of DNA sequence data. This increasing growth of sequence data demands better and efficient analysis methods. Identifying genes in this newly accumulated data is an important issue in bioinformatics, and it requires the prediction of the complete gene structure. Accurate identification of splice sites in DNA sequences plays one of the central roles of gene structural prediction in eukaryotes. Effective detection of splice sites requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the splice site surrounding region. A higher-order Markov model is generally regarded as a useful technique for modeling higher-order dependencies. However, their implementation requires estimating a large number of parameters, which is computationally expensive. RESULTS: The proposed method for splice site detection consists of two stages: a first order Markov model (MM1) is used in the first stage and a support vector machine (SVM) with polynomial kernel is used in the second stage. The MM1 serves as a pre-processing step for the SVM and takes DNA sequences as its input. It models the compositional features and dependencies of nucleotides in terms of probabilistic parameters around splice site regions. The probabilistic parameters are then fed into the SVM, which combines them nonlinearly to predict splice sites. When the proposed MM1-SVM model is compared with other existing standard splice site detection methods, it shows a superior performance in all the cases. CONCLUSION: We proposed an effective pre-processing scheme for the SVM and applied it for the identification of splice sites. This is a simple yet effective splice site detection method, which shows a better classification accuracy and computational speed than some other more complex methods. BioMed Central 2006-12-18 /pmc/articles/PMC1764471/ /pubmed/17254299 http://dx.doi.org/10.1186/1471-2105-7-S5-S15 Text en Copyright © 2006 Baten et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Baten, AKMA Chang, BCH Halgamuge, SK Li, Jason Splice site identification using probabilistic parameters and SVM classification
title	Splice site identification using probabilistic parameters and SVM classification
title_full	Splice site identification using probabilistic parameters and SVM classification
title_fullStr	Splice site identification using probabilistic parameters and SVM classification
title_full_unstemmed	Splice site identification using probabilistic parameters and SVM classification
title_short	Splice site identification using probabilistic parameters and SVM classification
title_sort	splice site identification using probabilistic parameters and svm classification
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1764471/ https://www.ncbi.nlm.nih.gov/pubmed/17254299 http://dx.doi.org/10.1186/1471-2105-7-S5-S15
work_keys_str_mv	AT batenakma splicesiteidentificationusingprobabilisticparametersandsvmclassification AT changbch splicesiteidentificationusingprobabilisticparametersandsvmclassification AT halgamugesk splicesiteidentificationusingprobabilisticparametersandsvmclassification AT lijason splicesiteidentificationusingprobabilisticparametersandsvmclassification

Splice site identification using probabilistic parameters and SVM classification

Ejemplares similares