Cargando…

A network approach for low dimensional signatures from high throughput data

One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables—a signature—for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regula...

Descripción completa

Detalles Bibliográficos
Autores principales: Curti, Nico, Levi, Giuseppe, Giampieri, Enrico, Castellani, Gastone, Remondini, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9789141/
https://www.ncbi.nlm.nih.gov/pubmed/36564421
http://dx.doi.org/10.1038/s41598-022-25549-9
Descripción
Sumario:One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables—a signature—for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables ([Formula: see text] –[Formula: see text] ). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models.