Cargando…

Gene set selection via LASSO penalized regression (SLPR)

Gene set testing is an important bioinformatics technique that addresses the challenges of power, interpretation and replication. To better support the analysis of large and highly overlapping gene set collections, researchers have recently developed a number of multiset methods that jointly evaluat...

Descripción completa

Detalles Bibliográficos
Autores principales: Frost, H. Robert, Amos, Christopher I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499546/
https://www.ncbi.nlm.nih.gov/pubmed/28472344
http://dx.doi.org/10.1093/nar/gkx291
_version_ 1783248490754736128
author Frost, H. Robert
Amos, Christopher I.
author_facet Frost, H. Robert
Amos, Christopher I.
author_sort Frost, H. Robert
collection PubMed
description Gene set testing is an important bioinformatics technique that addresses the challenges of power, interpretation and replication. To better support the analysis of large and highly overlapping gene set collections, researchers have recently developed a number of multiset methods that jointly evaluate all gene sets in a collection to identify a parsimonious group of functionally independent sets. Unfortunately, current multiset methods all use binary indicators for gene and gene set activity and assume that a gene is active if any containing gene set is active. This simplistic model limits performance on many types of genomic data. To address this limitation, we developed gene set Selection via LASSO Penalized Regression (SLPR), a novel mapping of multiset gene set testing to penalized multiple linear regression. The SLPR method assumes a linear relationship between continuous measures of gene activity and the activity of all gene sets in the collection. As we demonstrate via simulation studies and the analysis of TCGA data using MSigDB gene sets, the SLPR method outperforms existing multiset methods when the true biological process is well approximated by continuous activity measures and a linear association between genes and gene sets.
format Online
Article
Text
id pubmed-5499546
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54995462017-07-10 Gene set selection via LASSO penalized regression (SLPR) Frost, H. Robert Amos, Christopher I. Nucleic Acids Res Methods Online Gene set testing is an important bioinformatics technique that addresses the challenges of power, interpretation and replication. To better support the analysis of large and highly overlapping gene set collections, researchers have recently developed a number of multiset methods that jointly evaluate all gene sets in a collection to identify a parsimonious group of functionally independent sets. Unfortunately, current multiset methods all use binary indicators for gene and gene set activity and assume that a gene is active if any containing gene set is active. This simplistic model limits performance on many types of genomic data. To address this limitation, we developed gene set Selection via LASSO Penalized Regression (SLPR), a novel mapping of multiset gene set testing to penalized multiple linear regression. The SLPR method assumes a linear relationship between continuous measures of gene activity and the activity of all gene sets in the collection. As we demonstrate via simulation studies and the analysis of TCGA data using MSigDB gene sets, the SLPR method outperforms existing multiset methods when the true biological process is well approximated by continuous activity measures and a linear association between genes and gene sets. Oxford University Press 2017-07-07 2017-05-02 /pmc/articles/PMC5499546/ /pubmed/28472344 http://dx.doi.org/10.1093/nar/gkx291 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Frost, H. Robert
Amos, Christopher I.
Gene set selection via LASSO penalized regression (SLPR)
title Gene set selection via LASSO penalized regression (SLPR)
title_full Gene set selection via LASSO penalized regression (SLPR)
title_fullStr Gene set selection via LASSO penalized regression (SLPR)
title_full_unstemmed Gene set selection via LASSO penalized regression (SLPR)
title_short Gene set selection via LASSO penalized regression (SLPR)
title_sort gene set selection via lasso penalized regression (slpr)
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499546/
https://www.ncbi.nlm.nih.gov/pubmed/28472344
http://dx.doi.org/10.1093/nar/gkx291
work_keys_str_mv AT frosthrobert genesetselectionvialassopenalizedregressionslpr
AT amoschristopheri genesetselectionvialassopenalizedregressionslpr