Cargando…

Learning Interpretable SVMs for Biological Sequence Classification

BACKGROUND: Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just...

Descripción completa

Detalles Bibliográficos
Autores principales: Rätsch, Gunnar, Sonnenburg, Sören, Schäfer, Christin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810320/
https://www.ncbi.nlm.nih.gov/pubmed/16723012
http://dx.doi.org/10.1186/1471-2105-7-S1-S9
_version_ 1782132577800290304
author Rätsch, Gunnar
Sonnenburg, Sören
Schäfer, Christin
author_facet Rätsch, Gunnar
Sonnenburg, Sören
Schäfer, Christin
author_sort Rätsch, Gunnar
collection PubMed
description BACKGROUND: Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight. RESULTS: We propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination. CONCLUSION: The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions.
format Text
id pubmed-1810320
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18103202007-03-14 Learning Interpretable SVMs for Biological Sequence Classification Rätsch, Gunnar Sonnenburg, Sören Schäfer, Christin BMC Bioinformatics Proceedings BACKGROUND: Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight. RESULTS: We propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination. CONCLUSION: The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions. BioMed Central 2006-03-20 /pmc/articles/PMC1810320/ /pubmed/16723012 http://dx.doi.org/10.1186/1471-2105-7-S1-S9 Text en
spellingShingle Proceedings
Rätsch, Gunnar
Sonnenburg, Sören
Schäfer, Christin
Learning Interpretable SVMs for Biological Sequence Classification
title Learning Interpretable SVMs for Biological Sequence Classification
title_full Learning Interpretable SVMs for Biological Sequence Classification
title_fullStr Learning Interpretable SVMs for Biological Sequence Classification
title_full_unstemmed Learning Interpretable SVMs for Biological Sequence Classification
title_short Learning Interpretable SVMs for Biological Sequence Classification
title_sort learning interpretable svms for biological sequence classification
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1810320/
https://www.ncbi.nlm.nih.gov/pubmed/16723012
http://dx.doi.org/10.1186/1471-2105-7-S1-S9
work_keys_str_mv AT ratschgunnar learninginterpretablesvmsforbiologicalsequenceclassification
AT sonnenburgsoren learninginterpretablesvmsforbiologicalsequenceclassification
AT schaferchristin learninginterpretablesvmsforbiologicalsequenceclassification