Cargando…

Inferring Heterozygosity from Ancient and Low Coverage Genomes

While genetic diversity can be quantified accurately from high coverage sequencing data, it is often desirable to obtain such estimates from data with low coverage, either to save costs or because of low DNA quality, as is observed for ancient samples. Here, we introduce a method to accurately infer...

Descripción completa

Detalles Bibliográficos
Autores principales: Kousathanas, Athanasios, Leuenberger, Christoph, Link, Vivian, Sell, Christian, Burger, Joachim, Wegmann, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5223511/
https://www.ncbi.nlm.nih.gov/pubmed/27821432
http://dx.doi.org/10.1534/genetics.116.189985
_version_ 1782493185471152128
author Kousathanas, Athanasios
Leuenberger, Christoph
Link, Vivian
Sell, Christian
Burger, Joachim
Wegmann, Daniel
author_facet Kousathanas, Athanasios
Leuenberger, Christoph
Link, Vivian
Sell, Christian
Burger, Joachim
Wegmann, Daniel
author_sort Kousathanas, Athanasios
collection PubMed
description While genetic diversity can be quantified accurately from high coverage sequencing data, it is often desirable to obtain such estimates from data with low coverage, either to save costs or because of low DNA quality, as is observed for ancient samples. Here, we introduce a method to accurately infer heterozygosity probabilistically from sequences with average coverage [Formula: see text] of a single individual. The method relaxes the infinite sites assumption of previous methods, does not require a reference sequence, except for the initial alignment of the sequencing data, and takes into account both variable sequencing errors and potential postmortem damage. It is thus also applicable to nonmodel organisms and ancient genomes. Since error rates as reported by sequencing machines are generally distorted and require recalibration, we also introduce a method to accurately infer recalibration parameters in the presence of postmortem damage. This method does not require knowledge about the underlying genome sequence, but instead works with haploid data (e.g., from the X-chromosome from mammalian males) and integrates over the unknown genotypes. Using extensive simulations we show that a few megabasepairs of haploid data are sufficient for accurate recalibration, even at average coverages as low as [Formula: see text] At similar coverages, our method also produces very accurate estimates of heterozygosity down to [Formula: see text] within windows of about 1 Mbp. We further illustrate the usefulness of our approach by inferring genome-wide patterns of diversity for several ancient human samples, and we found that 3000–5000-year-old samples showed diversity patterns comparable to those of modern humans. In contrast, two European hunter-gatherer samples exhibited not only considerably lower levels of diversity than modern samples, but also highly distinct distributions of diversity along their genomes. Interestingly, these distributions were also very different between the two samples, supporting earlier conclusions of a highly diverse and structured population in Europe prior to the arrival of farming.
format Online
Article
Text
id pubmed-5223511
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-52235112017-01-11 Inferring Heterozygosity from Ancient and Low Coverage Genomes Kousathanas, Athanasios Leuenberger, Christoph Link, Vivian Sell, Christian Burger, Joachim Wegmann, Daniel Genetics Investigations While genetic diversity can be quantified accurately from high coverage sequencing data, it is often desirable to obtain such estimates from data with low coverage, either to save costs or because of low DNA quality, as is observed for ancient samples. Here, we introduce a method to accurately infer heterozygosity probabilistically from sequences with average coverage [Formula: see text] of a single individual. The method relaxes the infinite sites assumption of previous methods, does not require a reference sequence, except for the initial alignment of the sequencing data, and takes into account both variable sequencing errors and potential postmortem damage. It is thus also applicable to nonmodel organisms and ancient genomes. Since error rates as reported by sequencing machines are generally distorted and require recalibration, we also introduce a method to accurately infer recalibration parameters in the presence of postmortem damage. This method does not require knowledge about the underlying genome sequence, but instead works with haploid data (e.g., from the X-chromosome from mammalian males) and integrates over the unknown genotypes. Using extensive simulations we show that a few megabasepairs of haploid data are sufficient for accurate recalibration, even at average coverages as low as [Formula: see text] At similar coverages, our method also produces very accurate estimates of heterozygosity down to [Formula: see text] within windows of about 1 Mbp. We further illustrate the usefulness of our approach by inferring genome-wide patterns of diversity for several ancient human samples, and we found that 3000–5000-year-old samples showed diversity patterns comparable to those of modern humans. In contrast, two European hunter-gatherer samples exhibited not only considerably lower levels of diversity than modern samples, but also highly distinct distributions of diversity along their genomes. Interestingly, these distributions were also very different between the two samples, supporting earlier conclusions of a highly diverse and structured population in Europe prior to the arrival of farming. Genetics Society of America 2017-01 2016-11-07 /pmc/articles/PMC5223511/ /pubmed/27821432 http://dx.doi.org/10.1534/genetics.116.189985 Text en Copyright © 2017 by the Genetics Society of America Available freely online through the author-supported open access option.
spellingShingle Investigations
Kousathanas, Athanasios
Leuenberger, Christoph
Link, Vivian
Sell, Christian
Burger, Joachim
Wegmann, Daniel
Inferring Heterozygosity from Ancient and Low Coverage Genomes
title Inferring Heterozygosity from Ancient and Low Coverage Genomes
title_full Inferring Heterozygosity from Ancient and Low Coverage Genomes
title_fullStr Inferring Heterozygosity from Ancient and Low Coverage Genomes
title_full_unstemmed Inferring Heterozygosity from Ancient and Low Coverage Genomes
title_short Inferring Heterozygosity from Ancient and Low Coverage Genomes
title_sort inferring heterozygosity from ancient and low coverage genomes
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5223511/
https://www.ncbi.nlm.nih.gov/pubmed/27821432
http://dx.doi.org/10.1534/genetics.116.189985
work_keys_str_mv AT kousathanasathanasios inferringheterozygosityfromancientandlowcoveragegenomes
AT leuenbergerchristoph inferringheterozygosityfromancientandlowcoveragegenomes
AT linkvivian inferringheterozygosityfromancientandlowcoveragegenomes
AT sellchristian inferringheterozygosityfromancientandlowcoveragegenomes
AT burgerjoachim inferringheterozygosityfromancientandlowcoveragegenomes
AT wegmanndaniel inferringheterozygosityfromancientandlowcoveragegenomes