Cargando…

Genetic distance as an alternative to physical distance for definition of gene units in association studies

BACKGROUND: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve...

Descripción completa

Detalles Bibliográficos
Autores principales: Rodriguez-Fontenla, Cristina, Calaza, Manuel, Gonzalez, Antonio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4048458/
https://www.ncbi.nlm.nih.gov/pubmed/24884992
http://dx.doi.org/10.1186/1471-2164-15-408
_version_ 1782480528436363264
author Rodriguez-Fontenla, Cristina
Calaza, Manuel
Gonzalez, Antonio
author_facet Rodriguez-Fontenla, Cristina
Calaza, Manuel
Gonzalez, Antonio
author_sort Rodriguez-Fontenla, Cristina
collection PubMed
description BACKGROUND: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ± 50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances. RESULTS: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ± 50 Kb offset that has been common in previous studies. A SRR ≥ 2 was selected because it led to gene extensions with median length = 45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ± 50 Kb and with the SRR ≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR ≥2 genes led to a fully concordant interpretation in 17 loci; the ± 50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR ≥2 definition only missed 4 of the genes, whereas the based in the ± 50 Kb definition missed 10 genes. CONCLUSIONS: A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-408) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4048458
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40484582014-06-17 Genetic distance as an alternative to physical distance for definition of gene units in association studies Rodriguez-Fontenla, Cristina Calaza, Manuel Gonzalez, Antonio BMC Genomics Methodology Article BACKGROUND: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ± 50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances. RESULTS: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ± 50 Kb offset that has been common in previous studies. A SRR ≥ 2 was selected because it led to gene extensions with median length = 45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ± 50 Kb and with the SRR ≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR ≥2 genes led to a fully concordant interpretation in 17 loci; the ± 50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR ≥2 definition only missed 4 of the genes, whereas the based in the ± 50 Kb definition missed 10 genes. CONCLUSIONS: A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-408) contains supplementary material, which is available to authorized users. BioMed Central 2014-05-28 /pmc/articles/PMC4048458/ /pubmed/24884992 http://dx.doi.org/10.1186/1471-2164-15-408 Text en © Rodriguez-Fontenla et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Rodriguez-Fontenla, Cristina
Calaza, Manuel
Gonzalez, Antonio
Genetic distance as an alternative to physical distance for definition of gene units in association studies
title Genetic distance as an alternative to physical distance for definition of gene units in association studies
title_full Genetic distance as an alternative to physical distance for definition of gene units in association studies
title_fullStr Genetic distance as an alternative to physical distance for definition of gene units in association studies
title_full_unstemmed Genetic distance as an alternative to physical distance for definition of gene units in association studies
title_short Genetic distance as an alternative to physical distance for definition of gene units in association studies
title_sort genetic distance as an alternative to physical distance for definition of gene units in association studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4048458/
https://www.ncbi.nlm.nih.gov/pubmed/24884992
http://dx.doi.org/10.1186/1471-2164-15-408
work_keys_str_mv AT rodriguezfontenlacristina geneticdistanceasanalternativetophysicaldistancefordefinitionofgeneunitsinassociationstudies
AT calazamanuel geneticdistanceasanalternativetophysicaldistancefordefinitionofgeneunitsinassociationstudies
AT gonzalezantonio geneticdistanceasanalternativetophysicaldistancefordefinitionofgeneunitsinassociationstudies