Cargando…

Marked variation in predicted and observed variability of tandem repeat loci across the human genome

BACKGROUND: Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS) sequences to...

Descripción completa

Detalles Bibliográficos
Autores principales: O'Dushlaine, Colm T, Shields, Denis C
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2364633/
https://www.ncbi.nlm.nih.gov/pubmed/18416815
http://dx.doi.org/10.1186/1471-2164-9-175
_version_ 1782153998290124800
author O'Dushlaine, Colm T
Shields, Denis C
author_facet O'Dushlaine, Colm T
Shields, Denis C
author_sort O'Dushlaine, Colm T
collection PubMed
description BACKGROUND: Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS) sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. RESULTS: We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation ρ = 0.29, p < 0.0005) better than the correlation between the CEPH and WGS data (ρ = 0.17), presumably because the model smoothes noise from small sample sizes. A multivariate logistic model of 8 parameters accounted for 36% of the variation in the WGS data. Validation studies of 70 experimentally investigated TRs revealed high concordance with the model's predictions (p < 0.0001). CONCLUSION: Variability among 2–12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y – likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter – and excesses in chromosomes 6, 13, 20 and 21.
format Text
id pubmed-2364633
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23646332008-05-02 Marked variation in predicted and observed variability of tandem repeat loci across the human genome O'Dushlaine, Colm T Shields, Denis C BMC Genomics Research Article BACKGROUND: Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS) sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. RESULTS: We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation ρ = 0.29, p < 0.0005) better than the correlation between the CEPH and WGS data (ρ = 0.17), presumably because the model smoothes noise from small sample sizes. A multivariate logistic model of 8 parameters accounted for 36% of the variation in the WGS data. Validation studies of 70 experimentally investigated TRs revealed high concordance with the model's predictions (p < 0.0001). CONCLUSION: Variability among 2–12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y – likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter – and excesses in chromosomes 6, 13, 20 and 21. BioMed Central 2008-04-16 /pmc/articles/PMC2364633/ /pubmed/18416815 http://dx.doi.org/10.1186/1471-2164-9-175 Text en Copyright © 2008 O'Dushlaine and Shields; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
O'Dushlaine, Colm T
Shields, Denis C
Marked variation in predicted and observed variability of tandem repeat loci across the human genome
title Marked variation in predicted and observed variability of tandem repeat loci across the human genome
title_full Marked variation in predicted and observed variability of tandem repeat loci across the human genome
title_fullStr Marked variation in predicted and observed variability of tandem repeat loci across the human genome
title_full_unstemmed Marked variation in predicted and observed variability of tandem repeat loci across the human genome
title_short Marked variation in predicted and observed variability of tandem repeat loci across the human genome
title_sort marked variation in predicted and observed variability of tandem repeat loci across the human genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2364633/
https://www.ncbi.nlm.nih.gov/pubmed/18416815
http://dx.doi.org/10.1186/1471-2164-9-175
work_keys_str_mv AT odushlainecolmt markedvariationinpredictedandobservedvariabilityoftandemrepeatlociacrossthehumangenome
AT shieldsdenisc markedvariationinpredictedandobservedvariabilityoftandemrepeatlociacrossthehumangenome