Cargando…

Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure

Paleogenomic data has informed us about the movements, growth, and relationships of ancient populations. It has also given us context for medically relevant adaptations that appear in present-day humans due to introgression from other hominids, and it continues to help us characterize the evolutiona...

Descripción completa

Detalles Bibliográficos
Autores principales: Tretmanis, Jazeps Medina, Jay, Flora, Avila-Árcos, María C., Huerta-Sanchez, Emilia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557694/
https://www.ncbi.nlm.nih.gov/pubmed/37808674
http://dx.doi.org/10.1101/2023.09.28.560049
_version_ 1785117137270996992
author Tretmanis, Jazeps Medina
Jay, Flora
Avila-Árcos, María C.
Huerta-Sanchez, Emilia
author_facet Tretmanis, Jazeps Medina
Jay, Flora
Avila-Árcos, María C.
Huerta-Sanchez, Emilia
author_sort Tretmanis, Jazeps Medina
collection PubMed
description Paleogenomic data has informed us about the movements, growth, and relationships of ancient populations. It has also given us context for medically relevant adaptations that appear in present-day humans due to introgression from other hominids, and it continues to help us characterize the evolutionary history of humans. However, ancient DNA (aDNA) presents several practical challenges as various factors such as deamination, high fragmentation, environmental contamination of aDNA, and low amounts of recoverable endogenous DNA, make aDNA recovery and analysis more difficult than modern DNA. Most studies with aDNA leverage only SNP data, and only a few studies have made inferences on human demographic history based on haplotype data, possibly because haplotype estimation (or phasing) has not yet been systematically evaluated in the context of aDNA. Here, we evaluate how the unique challenges of aDNA can impact phasing quality. We also develop a software tool that simulates aDNA taking into account the features of aDNA as well as the evolutionary history of the population. We measured phasing error as a function of aDNA quality and demographic history, and found that low phasing error is achievable even for very ancient individuals (~ 400 generations in the past) as long as contamination and read depth are adequate. Our results show that population splits or bottleneck events occurring between the reference and phased populations affect phasing quality, with bottlenecks resulting in the highest average error rates. Finally, we found that using estimated haplotypes, even if not completely accurate, is superior to using the simulated genotype data when reconstructing changes in population structure after population splits between present-day and ancient populations.
format Online
Article
Text
id pubmed-10557694
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105576942023-10-07 Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure Tretmanis, Jazeps Medina Jay, Flora Avila-Árcos, María C. Huerta-Sanchez, Emilia bioRxiv Article Paleogenomic data has informed us about the movements, growth, and relationships of ancient populations. It has also given us context for medically relevant adaptations that appear in present-day humans due to introgression from other hominids, and it continues to help us characterize the evolutionary history of humans. However, ancient DNA (aDNA) presents several practical challenges as various factors such as deamination, high fragmentation, environmental contamination of aDNA, and low amounts of recoverable endogenous DNA, make aDNA recovery and analysis more difficult than modern DNA. Most studies with aDNA leverage only SNP data, and only a few studies have made inferences on human demographic history based on haplotype data, possibly because haplotype estimation (or phasing) has not yet been systematically evaluated in the context of aDNA. Here, we evaluate how the unique challenges of aDNA can impact phasing quality. We also develop a software tool that simulates aDNA taking into account the features of aDNA as well as the evolutionary history of the population. We measured phasing error as a function of aDNA quality and demographic history, and found that low phasing error is achievable even for very ancient individuals (~ 400 generations in the past) as long as contamination and read depth are adequate. Our results show that population splits or bottleneck events occurring between the reference and phased populations affect phasing quality, with bottlenecks resulting in the highest average error rates. Finally, we found that using estimated haplotypes, even if not completely accurate, is superior to using the simulated genotype data when reconstructing changes in population structure after population splits between present-day and ancient populations. Cold Spring Harbor Laboratory 2023-10-03 /pmc/articles/PMC10557694/ /pubmed/37808674 http://dx.doi.org/10.1101/2023.09.28.560049 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Tretmanis, Jazeps Medina
Jay, Flora
Avila-Árcos, María C.
Huerta-Sanchez, Emilia
Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure
title Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure
title_full Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure
title_fullStr Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure
title_full_unstemmed Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure
title_short Simulation-based Benchmarking of Ancient Haplotype Inference for Detecting Population Structure
title_sort simulation-based benchmarking of ancient haplotype inference for detecting population structure
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10557694/
https://www.ncbi.nlm.nih.gov/pubmed/37808674
http://dx.doi.org/10.1101/2023.09.28.560049
work_keys_str_mv AT tretmanisjazepsmedina simulationbasedbenchmarkingofancienthaplotypeinferencefordetectingpopulationstructure
AT jayflora simulationbasedbenchmarkingofancienthaplotypeinferencefordetectingpopulationstructure
AT avilaarcosmariac simulationbasedbenchmarkingofancienthaplotypeinferencefordetectingpopulationstructure
AT huertasanchezemilia simulationbasedbenchmarkingofancienthaplotypeinferencefordetectingpopulationstructure