Cargando…

hapCon: estimating contamination of ancient genomes by copying from reference haplotypes

MOTIVATION: Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Yilei, Ringbauer, Harald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344841/
https://www.ncbi.nlm.nih.gov/pubmed/35695771
http://dx.doi.org/10.1093/bioinformatics/btac390
_version_ 1784761303378690048
author Huang, Yilei
Ringbauer, Harald
author_facet Huang, Yilei
Ringbauer, Harald
author_sort Huang, Yilei
collection PubMed
description MOTIVATION: Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (<1× average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1× and below) remains a challenging task for existing approaches. RESULTS: We present a new method to estimate contamination in aDNA for male modern humans. It utilizes a Li&Stephens haplotype copying model for haploid X chromosomes, with mismatches modeled as errors or contamination. We assessed this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our experiments demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1× for SNP capture data (1240k) and 0.02× for whole genome sequencing data, substantially extending the coverage limit of previous male X chromosome-based contamination estimation methods. Our experiments demonstrate that hapCon has little bias for contamination up to 25–30% as long as the contaminating source is specified within continental genetic variation, and that its application range extends to human aDNA as old as ∼45 000 and various global ancestries. AVAILABILITY AND IMPLEMENTATION: We make hapCon available as part of a python package (hapROH), which is available at the Python Package Index (https://pypi.org/project/hapROH) and can be installed via pip. The documentation provides example use cases as blueprints for custom applications (https://haproh.readthedocs.io/en/latest/hapCon.html). The program can analyze either BAM files or pileup files produced with samtools. An implementation of our software (hapCon) using Python and C is deposited at https://github.com/hyl317/hapROH. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9344841
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93448412022-08-03 hapCon: estimating contamination of ancient genomes by copying from reference haplotypes Huang, Yilei Ringbauer, Harald Bioinformatics Original Papers MOTIVATION: Human ancient DNA (aDNA) studies have surged in recent years, revolutionizing the study of the human past. Typically, aDNA is preserved poorly, making such data prone to contamination from other human DNA. Therefore, it is important to rule out substantial contamination before proceeding to downstream analysis. As most aDNA samples can only be sequenced to low coverages (<1× average depth), computational methods that can robustly estimate contamination in the low coverage regime are needed. However, the ultra low-coverage regime (0.1× and below) remains a challenging task for existing approaches. RESULTS: We present a new method to estimate contamination in aDNA for male modern humans. It utilizes a Li&Stephens haplotype copying model for haploid X chromosomes, with mismatches modeled as errors or contamination. We assessed this new approach, hapCon, on simulated and down-sampled empirical aDNA data. Our experiments demonstrate that hapCon outperforms a commonly used tool for estimating male X contamination (ANGSD), with substantially lower variance and narrower confidence intervals, especially in the low coverage regime. We found that hapCon provides useful contamination estimates for coverages as low as 0.1× for SNP capture data (1240k) and 0.02× for whole genome sequencing data, substantially extending the coverage limit of previous male X chromosome-based contamination estimation methods. Our experiments demonstrate that hapCon has little bias for contamination up to 25–30% as long as the contaminating source is specified within continental genetic variation, and that its application range extends to human aDNA as old as ∼45 000 and various global ancestries. AVAILABILITY AND IMPLEMENTATION: We make hapCon available as part of a python package (hapROH), which is available at the Python Package Index (https://pypi.org/project/hapROH) and can be installed via pip. The documentation provides example use cases as blueprints for custom applications (https://haproh.readthedocs.io/en/latest/hapCon.html). The program can analyze either BAM files or pileup files produced with samtools. An implementation of our software (hapCon) using Python and C is deposited at https://github.com/hyl317/hapROH. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-13 /pmc/articles/PMC9344841/ /pubmed/35695771 http://dx.doi.org/10.1093/bioinformatics/btac390 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Huang, Yilei
Ringbauer, Harald
hapCon: estimating contamination of ancient genomes by copying from reference haplotypes
title hapCon: estimating contamination of ancient genomes by copying from reference haplotypes
title_full hapCon: estimating contamination of ancient genomes by copying from reference haplotypes
title_fullStr hapCon: estimating contamination of ancient genomes by copying from reference haplotypes
title_full_unstemmed hapCon: estimating contamination of ancient genomes by copying from reference haplotypes
title_short hapCon: estimating contamination of ancient genomes by copying from reference haplotypes
title_sort hapcon: estimating contamination of ancient genomes by copying from reference haplotypes
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9344841/
https://www.ncbi.nlm.nih.gov/pubmed/35695771
http://dx.doi.org/10.1093/bioinformatics/btac390
work_keys_str_mv AT huangyilei hapconestimatingcontaminationofancientgenomesbycopyingfromreferencehaplotypes
AT ringbauerharald hapconestimatingcontaminationofancientgenomesbycopyingfromreferencehaplotypes