Cargando…

A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data

Statistical tests for Hardy–Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test proce...

Descripción completa

Detalles Bibliográficos
Autores principales: Graffelman, Jan, Jain, Deepti, Weir, Bruce
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429372/
https://www.ncbi.nlm.nih.gov/pubmed/28374190
http://dx.doi.org/10.1007/s00439-017-1786-7
_version_ 1783236001040171008
author Graffelman, Jan
Jain, Deepti
Weir, Bruce
author_facet Graffelman, Jan
Jain, Deepti
Weir, Bruce
author_sort Graffelman, Jan
collection PubMed
description Statistical tests for Hardy–Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test procedures for autosomal and X-chromosomal variants. We find that the rate of disequilibrium largely exceeds what might be expected by chance alone for all chromosomes. Observed disequilibrium is, in about 60% of the cases, due to heterozygote excess. We suggest that most excess disequilibrium can be explained by sequencing problems, and hypothesize mechanisms that can explain exceptional heterozygosities. We report higher rates of disequilibrium for the MHC region on chromosome 6, regions flanking centromeres and p-arms of acrocentric chromosomes. We also detected long-range haplotypes and areas with incidental high disequilibrium. We report disequilibrium to be related to read depth, with variants having extreme read depths being more likely to be out of equilibrium. Disequilibrium rates were found to be 11 times higher in segmental duplications and simple tandem repeat regions. The variants with significant disequilibrium are seen to be concentrated in these areas. For next generation sequence data, Hardy–Weinberg disequilibrium seems to be a major indicator for copy number variation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00439-017-1786-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5429372
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-54293722017-05-30 A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data Graffelman, Jan Jain, Deepti Weir, Bruce Hum Genet Original Investigation Statistical tests for Hardy–Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test procedures for autosomal and X-chromosomal variants. We find that the rate of disequilibrium largely exceeds what might be expected by chance alone for all chromosomes. Observed disequilibrium is, in about 60% of the cases, due to heterozygote excess. We suggest that most excess disequilibrium can be explained by sequencing problems, and hypothesize mechanisms that can explain exceptional heterozygosities. We report higher rates of disequilibrium for the MHC region on chromosome 6, regions flanking centromeres and p-arms of acrocentric chromosomes. We also detected long-range haplotypes and areas with incidental high disequilibrium. We report disequilibrium to be related to read depth, with variants having extreme read depths being more likely to be out of equilibrium. Disequilibrium rates were found to be 11 times higher in segmental duplications and simple tandem repeat regions. The variants with significant disequilibrium are seen to be concentrated in these areas. For next generation sequence data, Hardy–Weinberg disequilibrium seems to be a major indicator for copy number variation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00439-017-1786-7) contains supplementary material, which is available to authorized users. Springer Berlin Heidelberg 2017-04-03 2017 /pmc/articles/PMC5429372/ /pubmed/28374190 http://dx.doi.org/10.1007/s00439-017-1786-7 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Original Investigation
Graffelman, Jan
Jain, Deepti
Weir, Bruce
A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data
title A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data
title_full A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data
title_fullStr A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data
title_full_unstemmed A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data
title_short A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data
title_sort genome-wide study of hardy–weinberg equilibrium with next generation sequence data
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5429372/
https://www.ncbi.nlm.nih.gov/pubmed/28374190
http://dx.doi.org/10.1007/s00439-017-1786-7
work_keys_str_mv AT graffelmanjan agenomewidestudyofhardyweinbergequilibriumwithnextgenerationsequencedata
AT jaindeepti agenomewidestudyofhardyweinbergequilibriumwithnextgenerationsequencedata
AT weirbruce agenomewidestudyofhardyweinbergequilibriumwithnextgenerationsequencedata
AT graffelmanjan genomewidestudyofhardyweinbergequilibriumwithnextgenerationsequencedata
AT jaindeepti genomewidestudyofhardyweinbergequilibriumwithnextgenerationsequencedata
AT weirbruce genomewidestudyofhardyweinbergequilibriumwithnextgenerationsequencedata