Cargando…
Marked variation in predicted and observed variability of tandem repeat loci across the human genome
BACKGROUND: Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS) sequences to...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2364633/ https://www.ncbi.nlm.nih.gov/pubmed/18416815 http://dx.doi.org/10.1186/1471-2164-9-175 |
_version_ | 1782153998290124800 |
---|---|
author | O'Dushlaine, Colm T Shields, Denis C |
author_facet | O'Dushlaine, Colm T Shields, Denis C |
author_sort | O'Dushlaine, Colm T |
collection | PubMed |
description | BACKGROUND: Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS) sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. RESULTS: We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation ρ = 0.29, p < 0.0005) better than the correlation between the CEPH and WGS data (ρ = 0.17), presumably because the model smoothes noise from small sample sizes. A multivariate logistic model of 8 parameters accounted for 36% of the variation in the WGS data. Validation studies of 70 experimentally investigated TRs revealed high concordance with the model's predictions (p < 0.0001). CONCLUSION: Variability among 2–12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y – likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter – and excesses in chromosomes 6, 13, 20 and 21. |
format | Text |
id | pubmed-2364633 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23646332008-05-02 Marked variation in predicted and observed variability of tandem repeat loci across the human genome O'Dushlaine, Colm T Shields, Denis C BMC Genomics Research Article BACKGROUND: Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS) sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. RESULTS: We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation ρ = 0.29, p < 0.0005) better than the correlation between the CEPH and WGS data (ρ = 0.17), presumably because the model smoothes noise from small sample sizes. A multivariate logistic model of 8 parameters accounted for 36% of the variation in the WGS data. Validation studies of 70 experimentally investigated TRs revealed high concordance with the model's predictions (p < 0.0001). CONCLUSION: Variability among 2–12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y – likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter – and excesses in chromosomes 6, 13, 20 and 21. BioMed Central 2008-04-16 /pmc/articles/PMC2364633/ /pubmed/18416815 http://dx.doi.org/10.1186/1471-2164-9-175 Text en Copyright © 2008 O'Dushlaine and Shields; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article O'Dushlaine, Colm T Shields, Denis C Marked variation in predicted and observed variability of tandem repeat loci across the human genome |
title | Marked variation in predicted and observed variability of tandem repeat loci across the human genome |
title_full | Marked variation in predicted and observed variability of tandem repeat loci across the human genome |
title_fullStr | Marked variation in predicted and observed variability of tandem repeat loci across the human genome |
title_full_unstemmed | Marked variation in predicted and observed variability of tandem repeat loci across the human genome |
title_short | Marked variation in predicted and observed variability of tandem repeat loci across the human genome |
title_sort | marked variation in predicted and observed variability of tandem repeat loci across the human genome |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2364633/ https://www.ncbi.nlm.nih.gov/pubmed/18416815 http://dx.doi.org/10.1186/1471-2164-9-175 |
work_keys_str_mv | AT odushlainecolmt markedvariationinpredictedandobservedvariabilityoftandemrepeatlociacrossthehumangenome AT shieldsdenisc markedvariationinpredictedandobservedvariabilityoftandemrepeatlociacrossthehumangenome |