Cargando…

How array design creates SNP ascertainment bias

Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be...

Descripción completa

Detalles Bibliográficos
Autores principales: Geibel, Johannes, Reimer, Christian, Weigend, Steffen, Weigend, Annett, Pook, Torsten, Simianer, Henner
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8009414/
https://www.ncbi.nlm.nih.gov/pubmed/33784304
http://dx.doi.org/10.1371/journal.pone.0245178
_version_ 1783672870868615168
author Geibel, Johannes
Reimer, Christian
Weigend, Steffen
Weigend, Annett
Pook, Torsten
Simianer, Henner
author_facet Geibel, Johannes
Reimer, Christian
Weigend, Steffen
Weigend, Annett
Pook, Torsten
Simianer, Henner
author_sort Geibel, Johannes
collection PubMed
description Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom(™) Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.
format Online
Article
Text
id pubmed-8009414
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-80094142021-04-07 How array design creates SNP ascertainment bias Geibel, Johannes Reimer, Christian Weigend, Steffen Weigend, Annett Pook, Torsten Simianer, Henner PLoS One Research Article Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom(™) Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive. Public Library of Science 2021-03-30 /pmc/articles/PMC8009414/ /pubmed/33784304 http://dx.doi.org/10.1371/journal.pone.0245178 Text en © 2021 Geibel et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Geibel, Johannes
Reimer, Christian
Weigend, Steffen
Weigend, Annett
Pook, Torsten
Simianer, Henner
How array design creates SNP ascertainment bias
title How array design creates SNP ascertainment bias
title_full How array design creates SNP ascertainment bias
title_fullStr How array design creates SNP ascertainment bias
title_full_unstemmed How array design creates SNP ascertainment bias
title_short How array design creates SNP ascertainment bias
title_sort how array design creates snp ascertainment bias
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8009414/
https://www.ncbi.nlm.nih.gov/pubmed/33784304
http://dx.doi.org/10.1371/journal.pone.0245178
work_keys_str_mv AT geibeljohannes howarraydesigncreatessnpascertainmentbias
AT reimerchristian howarraydesigncreatessnpascertainmentbias
AT weigendsteffen howarraydesigncreatessnpascertainmentbias
AT weigendannett howarraydesigncreatessnpascertainmentbias
AT pooktorsten howarraydesigncreatessnpascertainmentbias
AT simianerhenner howarraydesigncreatessnpascertainmentbias