Cargando…
Detecting disease-causing genes by LASSO-Patternsearch algorithm
The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data g...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367607/ https://www.ncbi.nlm.nih.gov/pubmed/18466561 |
_version_ | 1782154332627533824 |
---|---|
author | Shi, Weiliang Lee, Kristine E Wahba, Grace |
author_facet | Shi, Weiliang Lee, Kristine E Wahba, Grace |
author_sort | Shi, Weiliang |
collection | PubMed |
description | The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data generation model included nine unobserved trait loci, most of which have one or more of the generated SNPs associated with them. These data sets provide an ideal experimental test bed for evaluating new and old algorithms for selecting SNPs and covariates that can separate cases from controls, because the cases and controls are known as well as the identities of the trait loci. LASSO-Patternsearch is a new multi-step algorithm with a LASSO-type penalized likelihood method at its core specifically designed to detect and model interactions between important predictor variables. In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates. We start with a screen step within the framework of parametric logistic regression. The patterns that survived the screen step were further selected by a penalized logistic regression with the LASSO penalty. And finally, a parametric logistic regression model were built on the patterns that survived the LASSO step. In our analysis of Genetic Analysis Workshop 15 Problem 3 data we have identified most of the associated SNPs and relevant covariates. Upon using the model as a classifier, very competitive error rates were obtained. |
format | Text |
id | pubmed-2367607 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23676072008-05-06 Detecting disease-causing genes by LASSO-Patternsearch algorithm Shi, Weiliang Lee, Kristine E Wahba, Grace BMC Proc Proceedings The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data generation model included nine unobserved trait loci, most of which have one or more of the generated SNPs associated with them. These data sets provide an ideal experimental test bed for evaluating new and old algorithms for selecting SNPs and covariates that can separate cases from controls, because the cases and controls are known as well as the identities of the trait loci. LASSO-Patternsearch is a new multi-step algorithm with a LASSO-type penalized likelihood method at its core specifically designed to detect and model interactions between important predictor variables. In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates. We start with a screen step within the framework of parametric logistic regression. The patterns that survived the screen step were further selected by a penalized logistic regression with the LASSO penalty. And finally, a parametric logistic regression model were built on the patterns that survived the LASSO step. In our analysis of Genetic Analysis Workshop 15 Problem 3 data we have identified most of the associated SNPs and relevant covariates. Upon using the model as a classifier, very competitive error rates were obtained. BioMed Central 2007-12-18 /pmc/articles/PMC2367607/ /pubmed/18466561 Text en Copyright © 2007 Shi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Shi, Weiliang Lee, Kristine E Wahba, Grace Detecting disease-causing genes by LASSO-Patternsearch algorithm |
title | Detecting disease-causing genes by LASSO-Patternsearch algorithm |
title_full | Detecting disease-causing genes by LASSO-Patternsearch algorithm |
title_fullStr | Detecting disease-causing genes by LASSO-Patternsearch algorithm |
title_full_unstemmed | Detecting disease-causing genes by LASSO-Patternsearch algorithm |
title_short | Detecting disease-causing genes by LASSO-Patternsearch algorithm |
title_sort | detecting disease-causing genes by lasso-patternsearch algorithm |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367607/ https://www.ncbi.nlm.nih.gov/pubmed/18466561 |
work_keys_str_mv | AT shiweiliang detectingdiseasecausinggenesbylassopatternsearchalgorithm AT leekristinee detectingdiseasecausinggenesbylassopatternsearchalgorithm AT wahbagrace detectingdiseasecausinggenesbylassopatternsearchalgorithm |