Cargando…
Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets
Large survey databases for aging-related analysis are often examined to discover key factors that affect a dependent variable of interest. Typically, this analysis is performed with methods assuming linear dependencies between variables. Such assumptions however do not hold in many cases, wherein da...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6428288/ https://www.ncbi.nlm.nih.gov/pubmed/30897097 http://dx.doi.org/10.1371/journal.pone.0213584 |
_version_ | 1783405380395597824 |
---|---|
author | Krakovska, Olga Christie, Gregory Sixsmith, Andrew Ester, Martin Moreno, Sylvain |
author_facet | Krakovska, Olga Christie, Gregory Sixsmith, Andrew Ester, Martin Moreno, Sylvain |
author_sort | Krakovska, Olga |
collection | PubMed |
description | Large survey databases for aging-related analysis are often examined to discover key factors that affect a dependent variable of interest. Typically, this analysis is performed with methods assuming linear dependencies between variables. Such assumptions however do not hold in many cases, wherein data are linked by way of non-linear dependencies. This in turn requires applications of analytic methods, which are more accurate in identifying potentially non-linear dependencies. Here, we objectively compared the feature selection performance of several frequently-used linear selection methods and three non-linear selection methods in the context of large survey data. These methods were assessed using both synthetic and real-world datasets, wherein relationships between the features and dependent variables were known in advance. In contrast to linear methods, we found that the non-linear methods offered better overall feature selection performance than linear methods in all usage conditions. Moreover, the performance of the non-linear methods was more stable, being unaffected by the inclusion or exclusion of variables from the datasets. These properties make non-linear feature selection methods a potentially preferable tool for both hypothesis-driven and exploratory analyses for aging-related datasets. |
format | Online Article Text |
id | pubmed-6428288 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-64282882019-04-02 Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets Krakovska, Olga Christie, Gregory Sixsmith, Andrew Ester, Martin Moreno, Sylvain PLoS One Research Article Large survey databases for aging-related analysis are often examined to discover key factors that affect a dependent variable of interest. Typically, this analysis is performed with methods assuming linear dependencies between variables. Such assumptions however do not hold in many cases, wherein data are linked by way of non-linear dependencies. This in turn requires applications of analytic methods, which are more accurate in identifying potentially non-linear dependencies. Here, we objectively compared the feature selection performance of several frequently-used linear selection methods and three non-linear selection methods in the context of large survey data. These methods were assessed using both synthetic and real-world datasets, wherein relationships between the features and dependent variables were known in advance. In contrast to linear methods, we found that the non-linear methods offered better overall feature selection performance than linear methods in all usage conditions. Moreover, the performance of the non-linear methods was more stable, being unaffected by the inclusion or exclusion of variables from the datasets. These properties make non-linear feature selection methods a potentially preferable tool for both hypothesis-driven and exploratory analyses for aging-related datasets. Public Library of Science 2019-03-21 /pmc/articles/PMC6428288/ /pubmed/30897097 http://dx.doi.org/10.1371/journal.pone.0213584 Text en © 2019 Krakovska et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Krakovska, Olga Christie, Gregory Sixsmith, Andrew Ester, Martin Moreno, Sylvain Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets |
title | Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets |
title_full | Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets |
title_fullStr | Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets |
title_full_unstemmed | Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets |
title_short | Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets |
title_sort | performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6428288/ https://www.ncbi.nlm.nih.gov/pubmed/30897097 http://dx.doi.org/10.1371/journal.pone.0213584 |
work_keys_str_mv | AT krakovskaolga performancecomparisonoflinearandnonlinearfeatureselectionmethodsfortheanalysisoflargesurveydatasets AT christiegregory performancecomparisonoflinearandnonlinearfeatureselectionmethodsfortheanalysisoflargesurveydatasets AT sixsmithandrew performancecomparisonoflinearandnonlinearfeatureselectionmethodsfortheanalysisoflargesurveydatasets AT estermartin performancecomparisonoflinearandnonlinearfeatureselectionmethodsfortheanalysisoflargesurveydatasets AT morenosylvain performancecomparisonoflinearandnonlinearfeatureselectionmethodsfortheanalysisoflargesurveydatasets |