Cargando…

Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data

The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness i...

Descripción completa

Detalles Bibliográficos
Autores principales: Soraggi, Samuele, Wiuf, Carsten, Albrechtsen, Anders
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5919751/
https://www.ncbi.nlm.nih.gov/pubmed/29196497
http://dx.doi.org/10.1534/g3.117.300192
_version_ 1783317699883958272
author Soraggi, Samuele
Wiuf, Carsten
Albrechtsen, Anders
author_facet Soraggi, Samuele
Wiuf, Carsten
Albrechtsen, Anders
author_sort Soraggi, Samuele
collection PubMed
description The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1–10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates.
format Online
Article
Text
id pubmed-5919751
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-59197512018-04-27 Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data Soraggi, Samuele Wiuf, Carsten Albrechtsen, Anders G3 (Bethesda) Investigations The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1–10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. Genetics Society of America 2017-12-01 /pmc/articles/PMC5919751/ /pubmed/29196497 http://dx.doi.org/10.1534/g3.117.300192 Text en Copyright © 2018 Soraggi et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Soraggi, Samuele
Wiuf, Carsten
Albrechtsen, Anders
Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data
title Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data
title_full Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data
title_fullStr Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data
title_full_unstemmed Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data
title_short Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data
title_sort powerful inference with the d-statistic on low-coverage whole-genome data
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5919751/
https://www.ncbi.nlm.nih.gov/pubmed/29196497
http://dx.doi.org/10.1534/g3.117.300192
work_keys_str_mv AT soraggisamuele powerfulinferencewiththedstatisticonlowcoveragewholegenomedata
AT wiufcarsten powerfulinferencewiththedstatisticonlowcoveragewholegenomedata
AT albrechtsenanders powerfulinferencewiththedstatisticonlowcoveragewholegenomedata