Cargando…

Parenclitic and Synolytic Networks Revisited

Parenclitic networks provide a powerful and relatively new way to coerce multidimensional data into a graph form, enabling the application of graph theory to evaluate features. Different algorithms have been published for constructing parenclitic networks, leading to the question—which algorithm sho...

Descripción completa

Detalles Bibliográficos
Autores principales: Nazarenko, Tatiana, Whitwell, Harry J., Blyuss, Oleg, Zaikin, Alexey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8564045/
https://www.ncbi.nlm.nih.gov/pubmed/34745212
http://dx.doi.org/10.3389/fgene.2021.733783
_version_ 1784593530421772288
author Nazarenko, Tatiana
Whitwell, Harry J.
Blyuss, Oleg
Zaikin, Alexey
author_facet Nazarenko, Tatiana
Whitwell, Harry J.
Blyuss, Oleg
Zaikin, Alexey
author_sort Nazarenko, Tatiana
collection PubMed
description Parenclitic networks provide a powerful and relatively new way to coerce multidimensional data into a graph form, enabling the application of graph theory to evaluate features. Different algorithms have been published for constructing parenclitic networks, leading to the question—which algorithm should be chosen? Initially, it was suggested to calculate the weight of an edge between two nodes of the network as a deviation from a linear regression, calculated for a dependence of one of these features on the other. This method works well, but not when features do not have a linear relationship. To overcome this, it was suggested to calculate edge weights as the distance from the area of most probable values by using a kernel density estimation. In these two approaches only one class (typically controls or healthy population) is used to construct a model. To take account of a second class, we have introduced synolytic networks, using a boundary between two classes on the feature-feature plane to estimate the weight of the edge between these features. Common to all these approaches is that topological indices can be used to evaluate the structure represented by the graphs. To compare these network approaches alongside more traditional machine-learning algorithms, we performed a substantial analysis using both synthetic data with a priori known structure and publicly available datasets used for the benchmarking of ML-algorithms. Such a comparison has shown that the main advantage of parenclitic and synolytic networks is their resistance to over-fitting (occurring when the number of features is greater than the number of subjects) compared to other ML approaches. Secondly, the capability to visualise data in a structured form, even when this structure is not a priori available allows for visual inspection and the application of well-established graph theory to their interpretation/application, eliminating the “black-box” nature of other ML approaches.
format Online
Article
Text
id pubmed-8564045
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-85640452021-11-04 Parenclitic and Synolytic Networks Revisited Nazarenko, Tatiana Whitwell, Harry J. Blyuss, Oleg Zaikin, Alexey Front Genet Genetics Parenclitic networks provide a powerful and relatively new way to coerce multidimensional data into a graph form, enabling the application of graph theory to evaluate features. Different algorithms have been published for constructing parenclitic networks, leading to the question—which algorithm should be chosen? Initially, it was suggested to calculate the weight of an edge between two nodes of the network as a deviation from a linear regression, calculated for a dependence of one of these features on the other. This method works well, but not when features do not have a linear relationship. To overcome this, it was suggested to calculate edge weights as the distance from the area of most probable values by using a kernel density estimation. In these two approaches only one class (typically controls or healthy population) is used to construct a model. To take account of a second class, we have introduced synolytic networks, using a boundary between two classes on the feature-feature plane to estimate the weight of the edge between these features. Common to all these approaches is that topological indices can be used to evaluate the structure represented by the graphs. To compare these network approaches alongside more traditional machine-learning algorithms, we performed a substantial analysis using both synthetic data with a priori known structure and publicly available datasets used for the benchmarking of ML-algorithms. Such a comparison has shown that the main advantage of parenclitic and synolytic networks is their resistance to over-fitting (occurring when the number of features is greater than the number of subjects) compared to other ML approaches. Secondly, the capability to visualise data in a structured form, even when this structure is not a priori available allows for visual inspection and the application of well-established graph theory to their interpretation/application, eliminating the “black-box” nature of other ML approaches. Frontiers Media S.A. 2021-10-20 /pmc/articles/PMC8564045/ /pubmed/34745212 http://dx.doi.org/10.3389/fgene.2021.733783 Text en Copyright © 2021 Nazarenko, Whitwell, Blyuss and Zaikin. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Nazarenko, Tatiana
Whitwell, Harry J.
Blyuss, Oleg
Zaikin, Alexey
Parenclitic and Synolytic Networks Revisited
title Parenclitic and Synolytic Networks Revisited
title_full Parenclitic and Synolytic Networks Revisited
title_fullStr Parenclitic and Synolytic Networks Revisited
title_full_unstemmed Parenclitic and Synolytic Networks Revisited
title_short Parenclitic and Synolytic Networks Revisited
title_sort parenclitic and synolytic networks revisited
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8564045/
https://www.ncbi.nlm.nih.gov/pubmed/34745212
http://dx.doi.org/10.3389/fgene.2021.733783
work_keys_str_mv AT nazarenkotatiana parencliticandsynolyticnetworksrevisited
AT whitwellharryj parencliticandsynolyticnetworksrevisited
AT blyussoleg parencliticandsynolyticnetworksrevisited
AT zaikinalexey parencliticandsynolyticnetworksrevisited