Cargando…
Genetic distance as an alternative to physical distance for definition of gene units in association studies
BACKGROUND: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4048458/ https://www.ncbi.nlm.nih.gov/pubmed/24884992 http://dx.doi.org/10.1186/1471-2164-15-408 |
_version_ | 1782480528436363264 |
---|---|
author | Rodriguez-Fontenla, Cristina Calaza, Manuel Gonzalez, Antonio |
author_facet | Rodriguez-Fontenla, Cristina Calaza, Manuel Gonzalez, Antonio |
author_sort | Rodriguez-Fontenla, Cristina |
collection | PubMed |
description | BACKGROUND: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ± 50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances. RESULTS: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ± 50 Kb offset that has been common in previous studies. A SRR ≥ 2 was selected because it led to gene extensions with median length = 45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ± 50 Kb and with the SRR ≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR ≥2 genes led to a fully concordant interpretation in 17 loci; the ± 50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR ≥2 definition only missed 4 of the genes, whereas the based in the ± 50 Kb definition missed 10 genes. CONCLUSIONS: A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-408) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4048458 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40484582014-06-17 Genetic distance as an alternative to physical distance for definition of gene units in association studies Rodriguez-Fontenla, Cristina Calaza, Manuel Gonzalez, Antonio BMC Genomics Methodology Article BACKGROUND: Some association studies, as the implemented in VEGAS, ALIGATOR, i-GSEA4GWAS, GSA-SNP and other software tools, use genes as the unit of analysis. These genes include the coding sequence plus flanking sequences. Polymorphisms in the flanking sequences are of interest because they involve cis-regulatory elements or they inform on untyped genetic variants trough linkage disequilibrium. Gene extensions have customarily been defined as ± 50 Kb. This approach is not fully satisfactory because genetic relationships between neighbouring sequences are a function of genetic distances, which are only poorly replaced by physical distances. RESULTS: Standardized recombination rates (SRR) from the deCODE recombination map were used as units of genetic distances. We searched for a SRR producing flanking sequences near the ± 50 Kb offset that has been common in previous studies. A SRR ≥ 2 was selected because it led to gene extensions with median length = 45.3 Kb and the simplicity of an integer value. As expected, boundaries of the genes defined with the ± 50 Kb and with the SRR ≥2 rules were rarely concordant. The impact of these differences was illustrated with the interpretation of top association signals from two large studies including many hits and their detailed analysis based in different criteria. The definition based in genetic distance was more concordant with the results of these studies than the based in physical distance. In the analysis of 18 top disease associated loci form the first study, the SRR ≥2 genes led to a fully concordant interpretation in 17 loci; the ± 50 Kb genes only in 6. Interpretation of the 43 putative functional genes of the second study based in the SRR ≥2 definition only missed 4 of the genes, whereas the based in the ± 50 Kb definition missed 10 genes. CONCLUSIONS: A gene definition based on genetic distance led to results more concordant with expert detailed analyses than the commonly used based in physical distance. The genome coordinates for each gene are provided to maintain a simple use of the new definitions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-408) contains supplementary material, which is available to authorized users. BioMed Central 2014-05-28 /pmc/articles/PMC4048458/ /pubmed/24884992 http://dx.doi.org/10.1186/1471-2164-15-408 Text en © Rodriguez-Fontenla et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Rodriguez-Fontenla, Cristina Calaza, Manuel Gonzalez, Antonio Genetic distance as an alternative to physical distance for definition of gene units in association studies |
title | Genetic distance as an alternative to physical distance for definition of gene units in association studies |
title_full | Genetic distance as an alternative to physical distance for definition of gene units in association studies |
title_fullStr | Genetic distance as an alternative to physical distance for definition of gene units in association studies |
title_full_unstemmed | Genetic distance as an alternative to physical distance for definition of gene units in association studies |
title_short | Genetic distance as an alternative to physical distance for definition of gene units in association studies |
title_sort | genetic distance as an alternative to physical distance for definition of gene units in association studies |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4048458/ https://www.ncbi.nlm.nih.gov/pubmed/24884992 http://dx.doi.org/10.1186/1471-2164-15-408 |
work_keys_str_mv | AT rodriguezfontenlacristina geneticdistanceasanalternativetophysicaldistancefordefinitionofgeneunitsinassociationstudies AT calazamanuel geneticdistanceasanalternativetophysicaldistancefordefinitionofgeneunitsinassociationstudies AT gonzalezantonio geneticdistanceasanalternativetophysicaldistancefordefinitionofgeneunitsinassociationstudies |