Cargando…
Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study
We describe a network-based method to obtain a subset of representative variables from clinical data of subjects of the second Singapore Longitudinal Aging Study (SLAS-2), while preserving to a good extent the predictive performance of the full set with regards to a multi-faceted index of successful...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6638841/ https://www.ncbi.nlm.nih.gov/pubmed/31318894 http://dx.doi.org/10.1371/journal.pone.0219186 |
_version_ | 1783436370380849152 |
---|---|
author | Valenzuela, Jesus Felix Bayta Monterola, Christopher Tong, Victor Joo Chuan Fülöp, Tamàs Ng, Tze Pin Larbi, Anis |
author_facet | Valenzuela, Jesus Felix Bayta Monterola, Christopher Tong, Victor Joo Chuan Fülöp, Tamàs Ng, Tze Pin Larbi, Anis |
author_sort | Valenzuela, Jesus Felix Bayta |
collection | PubMed |
description | We describe a network-based method to obtain a subset of representative variables from clinical data of subjects of the second Singapore Longitudinal Aging Study (SLAS-2), while preserving to a good extent the predictive performance of the full set with regards to a multi-faceted index of successful aging, SAGE. To examine differences in predictive performance of high-degree nodes (“hubs”) and high-centrality ones (“cores”), we implement four subsetting strategies (two degree-based, two centrality-based) and obtain four surrogate sets of variables, which we use as input features for machine learning models to predict the SAGE index of subjects. All four models have variables belonging to the physical, cardiovascular, cognitive and immunological domains among their fifteen most important predictors. A fifth domain (leisure-time activities, LTA) is also present in some form. From a comparison of the surrogate sets’ size and predictive performance, a centrality-based approach (selection of the most central variable-nodes within each cluster) yielded the smallest-sized surrogate set, while having high prediction accuracy (measured by its model’s area-under-curve, AUC) in comparison to its analogous degree-based strategy (selection of the highest-degree nodes per cluster). Inclusion of the next most-central variables yielded negligible changes in predictive performance while more than doubling the surrogate set size. The centrality-based approach thus yields a surrogate set which offers a good balance between number of variables and prediction performance, and can act as a representative subset of the SLAS-2 clinical dataset. |
format | Online Article Text |
id | pubmed-6638841 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-66388412019-07-25 Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study Valenzuela, Jesus Felix Bayta Monterola, Christopher Tong, Victor Joo Chuan Fülöp, Tamàs Ng, Tze Pin Larbi, Anis PLoS One Research Article We describe a network-based method to obtain a subset of representative variables from clinical data of subjects of the second Singapore Longitudinal Aging Study (SLAS-2), while preserving to a good extent the predictive performance of the full set with regards to a multi-faceted index of successful aging, SAGE. To examine differences in predictive performance of high-degree nodes (“hubs”) and high-centrality ones (“cores”), we implement four subsetting strategies (two degree-based, two centrality-based) and obtain four surrogate sets of variables, which we use as input features for machine learning models to predict the SAGE index of subjects. All four models have variables belonging to the physical, cardiovascular, cognitive and immunological domains among their fifteen most important predictors. A fifth domain (leisure-time activities, LTA) is also present in some form. From a comparison of the surrogate sets’ size and predictive performance, a centrality-based approach (selection of the most central variable-nodes within each cluster) yielded the smallest-sized surrogate set, while having high prediction accuracy (measured by its model’s area-under-curve, AUC) in comparison to its analogous degree-based strategy (selection of the highest-degree nodes per cluster). Inclusion of the next most-central variables yielded negligible changes in predictive performance while more than doubling the surrogate set size. The centrality-based approach thus yields a surrogate set which offers a good balance between number of variables and prediction performance, and can act as a representative subset of the SLAS-2 clinical dataset. Public Library of Science 2019-07-18 /pmc/articles/PMC6638841/ /pubmed/31318894 http://dx.doi.org/10.1371/journal.pone.0219186 Text en © 2019 Valenzuela et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Valenzuela, Jesus Felix Bayta Monterola, Christopher Tong, Victor Joo Chuan Fülöp, Tamàs Ng, Tze Pin Larbi, Anis Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study |
title | Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study |
title_full | Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study |
title_fullStr | Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study |
title_full_unstemmed | Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study |
title_short | Degree and centrality-based approaches in network-based variable selection: Insights from the Singapore Longitudinal Aging Study |
title_sort | degree and centrality-based approaches in network-based variable selection: insights from the singapore longitudinal aging study |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6638841/ https://www.ncbi.nlm.nih.gov/pubmed/31318894 http://dx.doi.org/10.1371/journal.pone.0219186 |
work_keys_str_mv | AT valenzuelajesusfelixbayta degreeandcentralitybasedapproachesinnetworkbasedvariableselectioninsightsfromthesingaporelongitudinalagingstudy AT monterolachristopher degreeandcentralitybasedapproachesinnetworkbasedvariableselectioninsightsfromthesingaporelongitudinalagingstudy AT tongvictorjoochuan degreeandcentralitybasedapproachesinnetworkbasedvariableselectioninsightsfromthesingaporelongitudinalagingstudy AT fuloptamas degreeandcentralitybasedapproachesinnetworkbasedvariableselectioninsightsfromthesingaporelongitudinalagingstudy AT ngtzepin degreeandcentralitybasedapproachesinnetworkbasedvariableselectioninsightsfromthesingaporelongitudinalagingstudy AT larbianis degreeandcentralitybasedapproachesinnetworkbasedvariableselectioninsightsfromthesingaporelongitudinalagingstudy |