Cargando…

Constraining classifiers in molecular analysis: invariance and robustness

Analysing molecular profiles requires the selection of classification models that can cope with the high dimensionality and variability of these data. Also, improper reference point choice and scaling pose additional challenges. Often model selection is somewhat guided by ad hoc simulations rather t...

Descripción completa

Detalles Bibliográficos
Autores principales: Lausser, Ludwig, Szekely, Robin, Klimmek, Attila, Schmid, Florian, Kestler, Hans A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7061712/
https://www.ncbi.nlm.nih.gov/pubmed/32019472
http://dx.doi.org/10.1098/rsif.2019.0612
_version_ 1783504441484247040
author Lausser, Ludwig
Szekely, Robin
Klimmek, Attila
Schmid, Florian
Kestler, Hans A.
author_facet Lausser, Ludwig
Szekely, Robin
Klimmek, Attila
Schmid, Florian
Kestler, Hans A.
author_sort Lausser, Ludwig
collection PubMed
description Analysing molecular profiles requires the selection of classification models that can cope with the high dimensionality and variability of these data. Also, improper reference point choice and scaling pose additional challenges. Often model selection is somewhat guided by ad hoc simulations rather than by sophisticated considerations on the properties of a categorization model. Here, we derive and report four linked linear concept classes/models with distinct invariance properties for high-dimensional molecular classification. We can further show that these concept classes also form a half-order of complexity classes in terms of Vapnik–Chervonenkis dimensions, which also implies increased generalization abilities. We implemented support vector machines with these properties. Surprisingly, we were able to attain comparable or even superior generalization abilities to the standard linear one on the 27 investigated RNA-Seq and microarray datasets. Our results indicate that a priori chosen invariant models can replace ad hoc robustness analysis by interpretable and theoretically guaranteed properties in molecular categorization.
format Online
Article
Text
id pubmed-7061712
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-70617122020-03-26 Constraining classifiers in molecular analysis: invariance and robustness Lausser, Ludwig Szekely, Robin Klimmek, Attila Schmid, Florian Kestler, Hans A. J R Soc Interface Life Sciences–Mathematics interface Analysing molecular profiles requires the selection of classification models that can cope with the high dimensionality and variability of these data. Also, improper reference point choice and scaling pose additional challenges. Often model selection is somewhat guided by ad hoc simulations rather than by sophisticated considerations on the properties of a categorization model. Here, we derive and report four linked linear concept classes/models with distinct invariance properties for high-dimensional molecular classification. We can further show that these concept classes also form a half-order of complexity classes in terms of Vapnik–Chervonenkis dimensions, which also implies increased generalization abilities. We implemented support vector machines with these properties. Surprisingly, we were able to attain comparable or even superior generalization abilities to the standard linear one on the 27 investigated RNA-Seq and microarray datasets. Our results indicate that a priori chosen invariant models can replace ad hoc robustness analysis by interpretable and theoretically guaranteed properties in molecular categorization. The Royal Society 2020-02 2020-02-05 /pmc/articles/PMC7061712/ /pubmed/32019472 http://dx.doi.org/10.1098/rsif.2019.0612 Text en © 2020 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
spellingShingle Life Sciences–Mathematics interface
Lausser, Ludwig
Szekely, Robin
Klimmek, Attila
Schmid, Florian
Kestler, Hans A.
Constraining classifiers in molecular analysis: invariance and robustness
title Constraining classifiers in molecular analysis: invariance and robustness
title_full Constraining classifiers in molecular analysis: invariance and robustness
title_fullStr Constraining classifiers in molecular analysis: invariance and robustness
title_full_unstemmed Constraining classifiers in molecular analysis: invariance and robustness
title_short Constraining classifiers in molecular analysis: invariance and robustness
title_sort constraining classifiers in molecular analysis: invariance and robustness
topic Life Sciences–Mathematics interface
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7061712/
https://www.ncbi.nlm.nih.gov/pubmed/32019472
http://dx.doi.org/10.1098/rsif.2019.0612
work_keys_str_mv AT lausserludwig constrainingclassifiersinmolecularanalysisinvarianceandrobustness
AT szekelyrobin constrainingclassifiersinmolecularanalysisinvarianceandrobustness
AT klimmekattila constrainingclassifiersinmolecularanalysisinvarianceandrobustness
AT schmidflorian constrainingclassifiersinmolecularanalysisinvarianceandrobustness
AT kestlerhansa constrainingclassifiersinmolecularanalysisinvarianceandrobustness