Cargando…

The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases

Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods...

Descripción completa

Detalles Bibliográficos
Autores principales: Heidema, A Geert, Boer, Jolanda MA, Nagelkerke, Nico, Mariman, Edwin CM, van der A, Daphne L, Feskens, Edith JM
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1479365/
https://www.ncbi.nlm.nih.gov/pubmed/16630340
http://dx.doi.org/10.1186/1471-2156-7-23
_version_ 1782128186739392512
author Heidema, A Geert
Boer, Jolanda MA
Nagelkerke, Nico
Mariman, Edwin CM
van der A, Daphne L
Feskens, Edith JM
author_facet Heidema, A Geert
Boer, Jolanda MA
Nagelkerke, Nico
Mariman, Edwin CM
van der A, Daphne L
Feskens, Edith JM
author_sort Heidema, A Geert
collection PubMed
description Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.
format Text
id pubmed-1479365
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-14793652006-06-20 The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases Heidema, A Geert Boer, Jolanda MA Nagelkerke, Nico Mariman, Edwin CM van der A, Daphne L Feskens, Edith JM BMC Genet Commentary Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases. BioMed Central 2006-04-21 /pmc/articles/PMC1479365/ /pubmed/16630340 http://dx.doi.org/10.1186/1471-2156-7-23 Text en Copyright © 2006 Heidema et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Commentary
Heidema, A Geert
Boer, Jolanda MA
Nagelkerke, Nico
Mariman, Edwin CM
van der A, Daphne L
Feskens, Edith JM
The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases
title The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases
title_full The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases
title_fullStr The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases
title_full_unstemmed The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases
title_short The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases
title_sort challenge for genetic epidemiologists: how to analyze large numbers of snps in relation to complex diseases
topic Commentary
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1479365/
https://www.ncbi.nlm.nih.gov/pubmed/16630340
http://dx.doi.org/10.1186/1471-2156-7-23
work_keys_str_mv AT heidemaageert thechallengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT boerjolandama thechallengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT nagelkerkenico thechallengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT marimanedwincm thechallengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT vanderadaphnel thechallengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT feskensedithjm thechallengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT heidemaageert challengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT boerjolandama challengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT nagelkerkenico challengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT marimanedwincm challengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT vanderadaphnel challengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases
AT feskensedithjm challengeforgeneticepidemiologistshowtoanalyzelargenumbersofsnpsinrelationtocomplexdiseases