Cargando…

Accurate splice site prediction using support vector machines

BACKGROUND: For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks. RESULTS: In this work we consider Support Vector Machines f...

Descripción completa

Detalles Bibliográficos
Autores principales: Sonnenburg, Sören, Schweikert, Gabriele, Philips, Petra, Behr, Jonas, Rätsch, Gunnar
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2230508/
https://www.ncbi.nlm.nih.gov/pubmed/18269701
http://dx.doi.org/10.1186/1471-2105-8-S10-S7
_version_ 1782150222746484736
author Sonnenburg, Sören
Schweikert, Gabriele
Philips, Petra
Behr, Jonas
Rätsch, Gunnar
author_facet Sonnenburg, Sören
Schweikert, Gabriele
Philips, Petra
Behr, Jonas
Rätsch, Gunnar
author_sort Sonnenburg, Sören
collection PubMed
description BACKGROUND: For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks. RESULTS: In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder. AVAILABILITY: Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at .
format Text
id pubmed-2230508
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22305082008-05-09 Accurate splice site prediction using support vector machines Sonnenburg, Sören Schweikert, Gabriele Philips, Petra Behr, Jonas Rätsch, Gunnar BMC Bioinformatics Proceedings BACKGROUND: For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks. RESULTS: In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder. AVAILABILITY: Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at . BioMed Central 2007-12-21 /pmc/articles/PMC2230508/ /pubmed/18269701 http://dx.doi.org/10.1186/1471-2105-8-S10-S7 Text en Copyright © 2007 Sonnenburg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Sonnenburg, Sören
Schweikert, Gabriele
Philips, Petra
Behr, Jonas
Rätsch, Gunnar
Accurate splice site prediction using support vector machines
title Accurate splice site prediction using support vector machines
title_full Accurate splice site prediction using support vector machines
title_fullStr Accurate splice site prediction using support vector machines
title_full_unstemmed Accurate splice site prediction using support vector machines
title_short Accurate splice site prediction using support vector machines
title_sort accurate splice site prediction using support vector machines
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2230508/
https://www.ncbi.nlm.nih.gov/pubmed/18269701
http://dx.doi.org/10.1186/1471-2105-8-S10-S7
work_keys_str_mv AT sonnenburgsoren accuratesplicesitepredictionusingsupportvectormachines
AT schweikertgabriele accuratesplicesitepredictionusingsupportvectormachines
AT philipspetra accuratesplicesitepredictionusingsupportvectormachines
AT behrjonas accuratesplicesitepredictionusingsupportvectormachines
AT ratschgunnar accuratesplicesitepredictionusingsupportvectormachines