Cargando…

Detecting disease-causing genes by LASSO-Patternsearch algorithm

The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data g...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Weiliang, Lee, Kristine E, Wahba, Grace
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367607/
https://www.ncbi.nlm.nih.gov/pubmed/18466561
_version_ 1782154332627533824
author Shi, Weiliang
Lee, Kristine E
Wahba, Grace
author_facet Shi, Weiliang
Lee, Kristine E
Wahba, Grace
author_sort Shi, Weiliang
collection PubMed
description The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data generation model included nine unobserved trait loci, most of which have one or more of the generated SNPs associated with them. These data sets provide an ideal experimental test bed for evaluating new and old algorithms for selecting SNPs and covariates that can separate cases from controls, because the cases and controls are known as well as the identities of the trait loci. LASSO-Patternsearch is a new multi-step algorithm with a LASSO-type penalized likelihood method at its core specifically designed to detect and model interactions between important predictor variables. In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates. We start with a screen step within the framework of parametric logistic regression. The patterns that survived the screen step were further selected by a penalized logistic regression with the LASSO penalty. And finally, a parametric logistic regression model were built on the patterns that survived the LASSO step. In our analysis of Genetic Analysis Workshop 15 Problem 3 data we have identified most of the associated SNPs and relevant covariates. Upon using the model as a classifier, very competitive error rates were obtained.
format Text
id pubmed-2367607
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23676072008-05-06 Detecting disease-causing genes by LASSO-Patternsearch algorithm Shi, Weiliang Lee, Kristine E Wahba, Grace BMC Proc Proceedings The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data generation model included nine unobserved trait loci, most of which have one or more of the generated SNPs associated with them. These data sets provide an ideal experimental test bed for evaluating new and old algorithms for selecting SNPs and covariates that can separate cases from controls, because the cases and controls are known as well as the identities of the trait loci. LASSO-Patternsearch is a new multi-step algorithm with a LASSO-type penalized likelihood method at its core specifically designed to detect and model interactions between important predictor variables. In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates. We start with a screen step within the framework of parametric logistic regression. The patterns that survived the screen step were further selected by a penalized logistic regression with the LASSO penalty. And finally, a parametric logistic regression model were built on the patterns that survived the LASSO step. In our analysis of Genetic Analysis Workshop 15 Problem 3 data we have identified most of the associated SNPs and relevant covariates. Upon using the model as a classifier, very competitive error rates were obtained. BioMed Central 2007-12-18 /pmc/articles/PMC2367607/ /pubmed/18466561 Text en Copyright © 2007 Shi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Shi, Weiliang
Lee, Kristine E
Wahba, Grace
Detecting disease-causing genes by LASSO-Patternsearch algorithm
title Detecting disease-causing genes by LASSO-Patternsearch algorithm
title_full Detecting disease-causing genes by LASSO-Patternsearch algorithm
title_fullStr Detecting disease-causing genes by LASSO-Patternsearch algorithm
title_full_unstemmed Detecting disease-causing genes by LASSO-Patternsearch algorithm
title_short Detecting disease-causing genes by LASSO-Patternsearch algorithm
title_sort detecting disease-causing genes by lasso-patternsearch algorithm
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367607/
https://www.ncbi.nlm.nih.gov/pubmed/18466561
work_keys_str_mv AT shiweiliang detectingdiseasecausinggenesbylassopatternsearchalgorithm
AT leekristinee detectingdiseasecausinggenesbylassopatternsearchalgorithm
AT wahbagrace detectingdiseasecausinggenesbylassopatternsearchalgorithm