Cargando…
ccSVM: correcting Support Vector Machines for confounding factors in biological data classification
Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach fo...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117385/ https://www.ncbi.nlm.nih.gov/pubmed/21685091 http://dx.doi.org/10.1093/bioinformatics/btr204 |
_version_ | 1782206327028711424 |
---|---|
author | Li, Limin Rakitsch, Barbara Borgwardt, Karsten |
author_facet | Li, Limin Rakitsch, Barbara Borgwardt, Karsten |
author_sort | Li, Limin |
collection | PubMed |
description | Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach for classifying biological data, due to their high accuracy, their ability to deal with structured data such as strings, and the ease to integrate various types of data. However, it is unclear how to correct for confounding factors such as population structure, age or gender or experimental conditions in Support Vector Machine classification. Results: In this article, we present a Support Vector Machine classifier that can correct the prediction for observed confounding factors. This is achieved by minimizing the statistical dependence between the classifier and the confounding factors. We prove that this formulation can be transformed into a standard Support Vector Machine with rescaled input data. In our experiments, our confounder correcting SVM (ccSVM) improves tumor diagnosis based on samples from different labs, tuberculosis diagnosis in patients of varying age, ethnicity and gender, and phenotype prediction in the presence of population structure and outperforms state-of-the-art methods in terms of prediction accuracy. Availability: A ccSVM-implementation in MATLAB is available from http://webdav.tuebingen.mpg.de/u/karsten/Forschung/ISMB11_ccSVM/. Contact: limin.li@tuebingen.mpg.de; karsten.borgwardt@tuebingen.mpg.de |
format | Online Article Text |
id | pubmed-3117385 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-31173852011-06-17 ccSVM: correcting Support Vector Machines for confounding factors in biological data classification Li, Limin Rakitsch, Barbara Borgwardt, Karsten Bioinformatics Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria Motivation: Classifying biological data into different groups is a central task of bioinformatics: for instance, to predict the function of a gene or protein, the disease state of a patient or the phenotype of an individual based on its genotype. Support Vector Machines are a wide spread approach for classifying biological data, due to their high accuracy, their ability to deal with structured data such as strings, and the ease to integrate various types of data. However, it is unclear how to correct for confounding factors such as population structure, age or gender or experimental conditions in Support Vector Machine classification. Results: In this article, we present a Support Vector Machine classifier that can correct the prediction for observed confounding factors. This is achieved by minimizing the statistical dependence between the classifier and the confounding factors. We prove that this formulation can be transformed into a standard Support Vector Machine with rescaled input data. In our experiments, our confounder correcting SVM (ccSVM) improves tumor diagnosis based on samples from different labs, tuberculosis diagnosis in patients of varying age, ethnicity and gender, and phenotype prediction in the presence of population structure and outperforms state-of-the-art methods in terms of prediction accuracy. Availability: A ccSVM-implementation in MATLAB is available from http://webdav.tuebingen.mpg.de/u/karsten/Forschung/ISMB11_ccSVM/. Contact: limin.li@tuebingen.mpg.de; karsten.borgwardt@tuebingen.mpg.de Oxford University Press 2011-07-01 2011-06-14 /pmc/articles/PMC3117385/ /pubmed/21685091 http://dx.doi.org/10.1093/bioinformatics/btr204 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria Li, Limin Rakitsch, Barbara Borgwardt, Karsten ccSVM: correcting Support Vector Machines for confounding factors in biological data classification |
title | ccSVM: correcting Support Vector Machines for confounding factors in biological data classification |
title_full | ccSVM: correcting Support Vector Machines for confounding factors in biological data classification |
title_fullStr | ccSVM: correcting Support Vector Machines for confounding factors in biological data classification |
title_full_unstemmed | ccSVM: correcting Support Vector Machines for confounding factors in biological data classification |
title_short | ccSVM: correcting Support Vector Machines for confounding factors in biological data classification |
title_sort | ccsvm: correcting support vector machines for confounding factors in biological data classification |
topic | Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117385/ https://www.ncbi.nlm.nih.gov/pubmed/21685091 http://dx.doi.org/10.1093/bioinformatics/btr204 |
work_keys_str_mv | AT lilimin ccsvmcorrectingsupportvectormachinesforconfoundingfactorsinbiologicaldataclassification AT rakitschbarbara ccsvmcorrectingsupportvectormachinesforconfoundingfactorsinbiologicaldataclassification AT borgwardtkarsten ccsvmcorrectingsupportvectormachinesforconfoundingfactorsinbiologicaldataclassification |