Cargando…

SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor

Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box...

Descripción completa

Detalles Bibliográficos
Autores principales: Vidovic, Marina M. -C., Görnitz, Nico, Müller, Klaus-Robert, Rätsch, Gunnar, Kloft, Marius
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4686957/
https://www.ncbi.nlm.nih.gov/pubmed/26690911
http://dx.doi.org/10.1371/journal.pone.0144782
_version_ 1782406534349717504
author Vidovic, Marina M. -C.
Görnitz, Nico
Müller, Klaus-Robert
Rätsch, Gunnar
Kloft, Marius
author_facet Vidovic, Marina M. -C.
Görnitz, Nico
Müller, Klaus-Robert
Rätsch, Gunnar
Kloft, Marius
author_sort Vidovic, Marina M. -C.
collection PubMed
description Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set.
format Online
Article
Text
id pubmed-4686957
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46869572016-01-07 SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor Vidovic, Marina M. -C. Görnitz, Nico Müller, Klaus-Robert Rätsch, Gunnar Kloft, Marius PLoS One Research Article Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but—due to its black-box character—motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs—regardless of their length and complexity—underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set. Public Library of Science 2015-12-21 /pmc/articles/PMC4686957/ /pubmed/26690911 http://dx.doi.org/10.1371/journal.pone.0144782 Text en © 2015 Vidovic et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Vidovic, Marina M. -C.
Görnitz, Nico
Müller, Klaus-Robert
Rätsch, Gunnar
Kloft, Marius
SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor
title SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor
title_full SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor
title_fullStr SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor
title_full_unstemmed SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor
title_short SVM2Motif—Reconstructing Overlapping DNA Sequence Motifs by Mimicking an SVM Predictor
title_sort svm2motif—reconstructing overlapping dna sequence motifs by mimicking an svm predictor
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4686957/
https://www.ncbi.nlm.nih.gov/pubmed/26690911
http://dx.doi.org/10.1371/journal.pone.0144782
work_keys_str_mv AT vidovicmarinamc svm2motifreconstructingoverlappingdnasequencemotifsbymimickingansvmpredictor
AT gornitznico svm2motifreconstructingoverlappingdnasequencemotifsbymimickingansvmpredictor
AT mullerklausrobert svm2motifreconstructingoverlappingdnasequencemotifsbymimickingansvmpredictor
AT ratschgunnar svm2motifreconstructingoverlappingdnasequencemotifsbymimickingansvmpredictor
AT kloftmarius svm2motifreconstructingoverlappingdnasequencemotifsbymimickingansvmpredictor