Cargando…

An empirical evaluation of genotype imputation of ancient DNA

With capabilities of sequencing ancient DNA to high coverage often limited by sample quality or cost, imputation of missing genotypes presents a possibility to increase the power of inference as well as cost-effectiveness for the analysis of ancient data. However, the high degree of uncertainty ofte...

Descripción completa

Detalles Bibliográficos
Autores principales: Ausmees, Kristiina, Sanchez-Quinto, Federico, Jakobsson, Mattias, Nettelblad, Carl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9157144/
https://www.ncbi.nlm.nih.gov/pubmed/35482488
http://dx.doi.org/10.1093/g3journal/jkac089
_version_ 1784718578652545024
author Ausmees, Kristiina
Sanchez-Quinto, Federico
Jakobsson, Mattias
Nettelblad, Carl
author_facet Ausmees, Kristiina
Sanchez-Quinto, Federico
Jakobsson, Mattias
Nettelblad, Carl
author_sort Ausmees, Kristiina
collection PubMed
description With capabilities of sequencing ancient DNA to high coverage often limited by sample quality or cost, imputation of missing genotypes presents a possibility to increase the power of inference as well as cost-effectiveness for the analysis of ancient data. However, the high degree of uncertainty often associated with ancient DNA poses several methodological challenges, and performance of imputation methods in this context has not been fully explored. To gain further insights, we performed a systematic evaluation of imputation of ancient data using Beagle v4.0 and reference data from phase 3 of the 1000 Genomes project, investigating the effects of coverage, phased reference, and study sample size. Making use of five ancient individuals with high-coverage data available, we evaluated imputed data for accuracy, reference bias, and genetic affinities as captured by principal component analysis. We obtained genotype concordance levels of over 99% for data with 1× coverage, and similar levels of accuracy and reference bias at levels as low as 0.75×. Our findings suggest that using imputed data can be a realistic option for various population genetic analyses even for data in coverage ranges below 1×. We also show that a large and varied phased reference panel as well as the inclusion of low- to moderate-coverage ancient individuals in the study sample can increase imputation performance, particularly for rare alleles. In-depth analysis of imputed data with respect to genetic variants and allele frequencies gave further insight into the nature of errors arising during imputation, and can provide practical guidelines for postprocessing and validation prior to downstream analysis.
format Online
Article
Text
id pubmed-9157144
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91571442022-06-04 An empirical evaluation of genotype imputation of ancient DNA Ausmees, Kristiina Sanchez-Quinto, Federico Jakobsson, Mattias Nettelblad, Carl G3 (Bethesda) Investigation With capabilities of sequencing ancient DNA to high coverage often limited by sample quality or cost, imputation of missing genotypes presents a possibility to increase the power of inference as well as cost-effectiveness for the analysis of ancient data. However, the high degree of uncertainty often associated with ancient DNA poses several methodological challenges, and performance of imputation methods in this context has not been fully explored. To gain further insights, we performed a systematic evaluation of imputation of ancient data using Beagle v4.0 and reference data from phase 3 of the 1000 Genomes project, investigating the effects of coverage, phased reference, and study sample size. Making use of five ancient individuals with high-coverage data available, we evaluated imputed data for accuracy, reference bias, and genetic affinities as captured by principal component analysis. We obtained genotype concordance levels of over 99% for data with 1× coverage, and similar levels of accuracy and reference bias at levels as low as 0.75×. Our findings suggest that using imputed data can be a realistic option for various population genetic analyses even for data in coverage ranges below 1×. We also show that a large and varied phased reference panel as well as the inclusion of low- to moderate-coverage ancient individuals in the study sample can increase imputation performance, particularly for rare alleles. In-depth analysis of imputed data with respect to genetic variants and allele frequencies gave further insight into the nature of errors arising during imputation, and can provide practical guidelines for postprocessing and validation prior to downstream analysis. Oxford University Press 2022-04-28 /pmc/articles/PMC9157144/ /pubmed/35482488 http://dx.doi.org/10.1093/g3journal/jkac089 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigation
Ausmees, Kristiina
Sanchez-Quinto, Federico
Jakobsson, Mattias
Nettelblad, Carl
An empirical evaluation of genotype imputation of ancient DNA
title An empirical evaluation of genotype imputation of ancient DNA
title_full An empirical evaluation of genotype imputation of ancient DNA
title_fullStr An empirical evaluation of genotype imputation of ancient DNA
title_full_unstemmed An empirical evaluation of genotype imputation of ancient DNA
title_short An empirical evaluation of genotype imputation of ancient DNA
title_sort empirical evaluation of genotype imputation of ancient dna
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9157144/
https://www.ncbi.nlm.nih.gov/pubmed/35482488
http://dx.doi.org/10.1093/g3journal/jkac089
work_keys_str_mv AT ausmeeskristiina anempiricalevaluationofgenotypeimputationofancientdna
AT sanchezquintofederico anempiricalevaluationofgenotypeimputationofancientdna
AT jakobssonmattias anempiricalevaluationofgenotypeimputationofancientdna
AT nettelbladcarl anempiricalevaluationofgenotypeimputationofancientdna
AT ausmeeskristiina empiricalevaluationofgenotypeimputationofancientdna
AT sanchezquintofederico empiricalevaluationofgenotypeimputationofancientdna
AT jakobssonmattias empiricalevaluationofgenotypeimputationofancientdna
AT nettelbladcarl empiricalevaluationofgenotypeimputationofancientdna