Cargando…

The partitioned LASSO-patternsearch algorithm with application to gene expression data

BACKGROUND: In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Weiliang, Wahba, Grace, Irizarry, Rafael A, Bravo, Hector Corrada, Wright, Stephen J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505477/
https://www.ncbi.nlm.nih.gov/pubmed/22587526
http://dx.doi.org/10.1186/1471-2105-13-98
_version_ 1782250762672275456
author Shi, Weiliang
Wahba, Grace
Irizarry, Rafael A
Bravo, Hector Corrada
Wright, Stephen J
author_facet Shi, Weiliang
Wahba, Grace
Irizarry, Rafael A
Bravo, Hector Corrada
Wright, Stephen J
author_sort Shi, Weiliang
collection PubMed
description BACKGROUND: In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly to modern regularization approaches that thrive in high-dimensional settings. RESULTS: The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes. CONCLUSIONS: We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several alternative methodologies.
format Online
Article
Text
id pubmed-3505477
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35054772012-11-29 The partitioned LASSO-patternsearch algorithm with application to gene expression data Shi, Weiliang Wahba, Grace Irizarry, Rafael A Bravo, Hector Corrada Wright, Stephen J BMC Bioinformatics Methodology Article BACKGROUND: In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly to modern regularization approaches that thrive in high-dimensional settings. RESULTS: The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes. CONCLUSIONS: We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several alternative methodologies. BioMed Central 2012-05-15 /pmc/articles/PMC3505477/ /pubmed/22587526 http://dx.doi.org/10.1186/1471-2105-13-98 Text en Copyright ©2012 Shi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Shi, Weiliang
Wahba, Grace
Irizarry, Rafael A
Bravo, Hector Corrada
Wright, Stephen J
The partitioned LASSO-patternsearch algorithm with application to gene expression data
title The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_full The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_fullStr The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_full_unstemmed The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_short The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_sort partitioned lasso-patternsearch algorithm with application to gene expression data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505477/
https://www.ncbi.nlm.nih.gov/pubmed/22587526
http://dx.doi.org/10.1186/1471-2105-13-98
work_keys_str_mv AT shiweiliang thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT wahbagrace thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT irizarryrafaela thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT bravohectorcorrada thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT wrightstephenj thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT shiweiliang partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT wahbagrace partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT irizarryrafaela partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT bravohectorcorrada partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT wrightstephenj partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata