Cargando…

FARMS: A New Algorithm for Variable Selection

Large datasets including an extensive number of covariates are generated these days in many different situations, for instance, in detailed genetic studies of outbreed human populations or in complex analyses of immune responses to different infections. Aiming at informing clinical interventions or...

Descripción completa

Detalles Bibliográficos
Autores principales: Perez-Alvarez, Susana, Gómez, Guadalupe, Brander, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4529908/
https://www.ncbi.nlm.nih.gov/pubmed/26273608
http://dx.doi.org/10.1155/2015/319797
_version_ 1782384838223855616
author Perez-Alvarez, Susana
Gómez, Guadalupe
Brander, Christian
author_facet Perez-Alvarez, Susana
Gómez, Guadalupe
Brander, Christian
author_sort Perez-Alvarez, Susana
collection PubMed
description Large datasets including an extensive number of covariates are generated these days in many different situations, for instance, in detailed genetic studies of outbreed human populations or in complex analyses of immune responses to different infections. Aiming at informing clinical interventions or vaccine design, methods for variable selection identifying those variables with the optimal prediction performance for a specific outcome are crucial. However, testing for all potential subsets of variables is not feasible and alternatives to existing methods are needed. Here, we describe a new method to handle such complex datasets, referred to as FARMS, that combines forward and all subsets regression for model selection. We apply FARMS to a host genetic and immunological dataset of over 800 individuals from Lima (Peru) and Durban (South Africa) who were HIV infected and tested for antiviral immune responses. This dataset includes more than 500 explanatory variables: around 400 variables with information on HIV immune reactivity and around 100 individual genetic characteristics. We have implemented FARMS in R statistical language and we showed that FARMS is fast and outcompetes other comparable commonly used approaches, thus providing a new tool for the thorough analysis of complex datasets without the need for massive computational infrastructure.
format Online
Article
Text
id pubmed-4529908
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-45299082015-08-13 FARMS: A New Algorithm for Variable Selection Perez-Alvarez, Susana Gómez, Guadalupe Brander, Christian Biomed Res Int Research Article Large datasets including an extensive number of covariates are generated these days in many different situations, for instance, in detailed genetic studies of outbreed human populations or in complex analyses of immune responses to different infections. Aiming at informing clinical interventions or vaccine design, methods for variable selection identifying those variables with the optimal prediction performance for a specific outcome are crucial. However, testing for all potential subsets of variables is not feasible and alternatives to existing methods are needed. Here, we describe a new method to handle such complex datasets, referred to as FARMS, that combines forward and all subsets regression for model selection. We apply FARMS to a host genetic and immunological dataset of over 800 individuals from Lima (Peru) and Durban (South Africa) who were HIV infected and tested for antiviral immune responses. This dataset includes more than 500 explanatory variables: around 400 variables with information on HIV immune reactivity and around 100 individual genetic characteristics. We have implemented FARMS in R statistical language and we showed that FARMS is fast and outcompetes other comparable commonly used approaches, thus providing a new tool for the thorough analysis of complex datasets without the need for massive computational infrastructure. Hindawi Publishing Corporation 2015 2015-07-26 /pmc/articles/PMC4529908/ /pubmed/26273608 http://dx.doi.org/10.1155/2015/319797 Text en Copyright © 2015 Susana Perez-Alvarez et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Perez-Alvarez, Susana
Gómez, Guadalupe
Brander, Christian
FARMS: A New Algorithm for Variable Selection
title FARMS: A New Algorithm for Variable Selection
title_full FARMS: A New Algorithm for Variable Selection
title_fullStr FARMS: A New Algorithm for Variable Selection
title_full_unstemmed FARMS: A New Algorithm for Variable Selection
title_short FARMS: A New Algorithm for Variable Selection
title_sort farms: a new algorithm for variable selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4529908/
https://www.ncbi.nlm.nih.gov/pubmed/26273608
http://dx.doi.org/10.1155/2015/319797
work_keys_str_mv AT perezalvarezsusana farmsanewalgorithmforvariableselection
AT gomezguadalupe farmsanewalgorithmforvariableselection
AT branderchristian farmsanewalgorithmforvariableselection