Cargando…

Constraining classifiers in molecular analysis: invariance and robustness

Analysing molecular profiles requires the selection of classification models that can cope with the high dimensionality and variability of these data. Also, improper reference point choice and scaling pose additional challenges. Often model selection is somewhat guided by ad hoc simulations rather t...

Descripción completa

Detalles Bibliográficos
Autores principales: Lausser, Ludwig, Szekely, Robin, Klimmek, Attila, Schmid, Florian, Kestler, Hans A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7061712/
https://www.ncbi.nlm.nih.gov/pubmed/32019472
http://dx.doi.org/10.1098/rsif.2019.0612
Descripción
Sumario:Analysing molecular profiles requires the selection of classification models that can cope with the high dimensionality and variability of these data. Also, improper reference point choice and scaling pose additional challenges. Often model selection is somewhat guided by ad hoc simulations rather than by sophisticated considerations on the properties of a categorization model. Here, we derive and report four linked linear concept classes/models with distinct invariance properties for high-dimensional molecular classification. We can further show that these concept classes also form a half-order of complexity classes in terms of Vapnik–Chervonenkis dimensions, which also implies increased generalization abilities. We implemented support vector machines with these properties. Surprisingly, we were able to attain comparable or even superior generalization abilities to the standard linear one on the 27 investigated RNA-Seq and microarray datasets. Our results indicate that a priori chosen invariant models can replace ad hoc robustness analysis by interpretable and theoretically guaranteed properties in molecular categorization.