Cargando…

CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data

To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Söylev, Arda, Çokoglu, Sevim Seda, Koptekin, Dilek, Alkan, Can, Somel, Mehmet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873172/
https://www.ncbi.nlm.nih.gov/pubmed/36516232
http://dx.doi.org/10.1371/journal.pcbi.1010788
_version_ 1784877545025437696
author Söylev, Arda
Çokoglu, Sevim Seda
Koptekin, Dilek
Alkan, Can
Somel, Mehmet
author_facet Söylev, Arda
Çokoglu, Sevim Seda
Koptekin, Dilek
Alkan, Can
Somel, Mehmet
author_sort Söylev, Arda
collection PubMed
description To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1×) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at ≥1×, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44×-26× (median 4×) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage.
format Online
Article
Text
id pubmed-9873172
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-98731722023-01-25 CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data Söylev, Arda Çokoglu, Sevim Seda Koptekin, Dilek Alkan, Can Somel, Mehmet PLoS Comput Biol Research Article To date, ancient genome analyses have been largely confined to the study of single nucleotide polymorphisms (SNPs). Copy number variants (CNVs) are a major contributor of disease and of evolutionary adaptation, but identifying CNVs in ancient shotgun-sequenced genomes is hampered by typical low genome coverage (<1×) and short fragments (<80 bps), precluding standard CNV detection software to be effectively applied to ancient genomes. Here we present CONGA, tailored for genotyping CNVs at low coverage. Simulations and down-sampling experiments suggest that CONGA can genotype deletions >1 kbps with F-scores >0.75 at ≥1×, and distinguish between heterozygous and homozygous states. We used CONGA to genotype 10,002 outgroup-ascertained deletions across a heterogenous set of 71 ancient human genomes spanning the last 50,000 years, produced using variable experimental protocols. A fraction of these (21/71) display divergent deletion profiles unrelated to their population origin, but attributable to technical factors such as coverage and read length. The majority of the sample (50/71), despite originating from nine different laboratories and having coverages ranging from 0.44×-26× (median 4×) and average read lengths 52-121 bps (median 69), exhibit coherent deletion frequencies. Across these 50 genomes, inter-individual genetic diversity measured using SNPs and CONGA-genotyped deletions are highly correlated. CONGA-genotyped deletions also display purifying selection signatures, as expected. CONGA thus paves the way for systematic CNV analyses in ancient genomes, despite the technical challenges posed by low and variable genome coverage. Public Library of Science 2022-12-14 /pmc/articles/PMC9873172/ /pubmed/36516232 http://dx.doi.org/10.1371/journal.pcbi.1010788 Text en © 2022 Söylev et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Söylev, Arda
Çokoglu, Sevim Seda
Koptekin, Dilek
Alkan, Can
Somel, Mehmet
CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data
title CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data
title_full CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data
title_fullStr CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data
title_full_unstemmed CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data
title_short CONGA: Copy number variation genotyping in ancient genomes and low-coverage sequencing data
title_sort conga: copy number variation genotyping in ancient genomes and low-coverage sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873172/
https://www.ncbi.nlm.nih.gov/pubmed/36516232
http://dx.doi.org/10.1371/journal.pcbi.1010788
work_keys_str_mv AT soylevarda congacopynumbervariationgenotypinginancientgenomesandlowcoveragesequencingdata
AT cokoglusevimseda congacopynumbervariationgenotypinginancientgenomesandlowcoveragesequencingdata
AT koptekindilek congacopynumbervariationgenotypinginancientgenomesandlowcoveragesequencingdata
AT alkancan congacopynumbervariationgenotypinginancientgenomesandlowcoveragesequencingdata
AT somelmehmet congacopynumbervariationgenotypinginancientgenomesandlowcoveragesequencingdata