Cargando…

ccSVM: correcting Support Vector Machines for confounding factors in biological data classification

Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Limin, Rakitsch, Barbara, Borgwardt, Karsten
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117385/
https://www.ncbi.nlm.nih.gov/pubmed/21685091
http://dx.doi.org/10.1093/bioinformatics/btr204
_version_ 1782206327028711424
author Li, Limin
Rakitsch, Barbara
Borgwardt, Karsten
author_facet Li, Limin
Rakitsch, Barbara
Borgwardt, Karsten
author_sort Li, Limin
collection PubMed
description Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach for classifying biological data, due to their high accuracy, their ability to deal with structured data such as strings, and the ease to integrate various types of data. However, it is unclear how to correct for confounding factors such as population structure, age or gender or experimental conditions in Support Vector Machine classification. Results: In this article, we present a Support Vector Machine classifier that can correct the prediction for observed confounding factors. This is achieved by minimizing the statistical dependence between the classifier and the confounding factors. We prove that this formulation can be transformed into a standard Support Vector Machine with rescaled input data. In our experiments, our confounder correcting SVM (ccSVM) improves tumor diagnosis based on samples from different labs, tuberculosis diagnosis in patients of varying age, ethnicity and gender, and phenotype prediction in the presence of population structure and outperforms state-of-the-art methods in terms of prediction accuracy. Availability: A ccSVM-implementation in MATLAB is available from http://webdav.tuebingen.mpg.de/u/karsten/Forschung/ISMB11_ccSVM/. Contact: limin.li@tuebingen.mpg.de; karsten.borgwardt@tuebingen.mpg.de
format Online
Article
Text
id pubmed-3117385
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31173852011-06-17 ccSVM: correcting Support Vector Machines for confounding factors in biological data classification Li, Limin Rakitsch, Barbara Borgwardt, Karsten Bioinformatics Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach for classifying biological data, due to their high accuracy, their ability to deal with structured data such as strings, and the ease to integrate various types of data. However, it is unclear how to correct for confounding factors such as population structure, age or gender or experimental conditions in Support Vector Machine classification. Results: In this article, we present a Support Vector Machine classifier that can correct the prediction for observed confounding factors. This is achieved by minimizing the statistical dependence between the classifier and the confounding factors. We prove that this formulation can be transformed into a standard Support Vector Machine with rescaled input data. In our experiments, our confounder correcting SVM (ccSVM) improves tumor diagnosis based on samples from different labs, tuberculosis diagnosis in patients of varying age, ethnicity and gender, and phenotype prediction in the presence of population structure and outperforms state-of-the-art methods in terms of prediction accuracy. Availability: A ccSVM-implementation in MATLAB is available from http://webdav.tuebingen.mpg.de/u/karsten/Forschung/ISMB11_ccSVM/. Contact: limin.li@tuebingen.mpg.de; karsten.borgwardt@tuebingen.mpg.de Oxford University Press 2011-07-01 2011-06-14 /pmc/articles/PMC3117385/ /pubmed/21685091 http://dx.doi.org/10.1093/bioinformatics/btr204 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria
Li, Limin
Rakitsch, Barbara
Borgwardt, Karsten
ccSVM: correcting Support Vector Machines for confounding factors in biological data classification
title ccSVM: correcting Support Vector Machines for confounding factors in biological data classification
title_full ccSVM: correcting Support Vector Machines for confounding factors in biological data classification
title_fullStr ccSVM: correcting Support Vector Machines for confounding factors in biological data classification
title_full_unstemmed ccSVM: correcting Support Vector Machines for confounding factors in biological data classification
title_short ccSVM: correcting Support Vector Machines for confounding factors in biological data classification
title_sort ccsvm: correcting support vector machines for confounding factors in biological data classification
topic Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117385/
https://www.ncbi.nlm.nih.gov/pubmed/21685091
http://dx.doi.org/10.1093/bioinformatics/btr204
work_keys_str_mv AT lilimin ccsvmcorrectingsupportvectormachinesforconfoundingfactorsinbiologicaldataclassification
AT rakitschbarbara ccsvmcorrectingsupportvectormachinesforconfoundingfactorsinbiologicaldataclassification
AT borgwardtkarsten ccsvmcorrectingsupportvectormachinesforconfoundingfactorsinbiologicaldataclassification