Cargando…

A joint use of pooling and imputation for genotyping SNPs

BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount...

Descripción completa

Detalles Bibliográficos
Autores principales: Clouard, Camille, Ausmees, Kristiina, Nettelblad, Carl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563787/
https://www.ncbi.nlm.nih.gov/pubmed/36229780
http://dx.doi.org/10.1186/s12859-022-04974-7
_version_ 1784808486766379008
author Clouard, Camille
Ausmees, Kristiina
Nettelblad, Carl
author_facet Clouard, Camille
Ausmees, Kristiina
Nettelblad, Carl
author_sort Clouard, Camille
collection PubMed
description BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04974-7.
format Online
Article
Text
id pubmed-9563787
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95637872022-10-15 A joint use of pooling and imputation for genotyping SNPs Clouard, Camille Ausmees, Kristiina Nettelblad, Carl BMC Bioinformatics Research BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04974-7. BioMed Central 2022-10-13 /pmc/articles/PMC9563787/ /pubmed/36229780 http://dx.doi.org/10.1186/s12859-022-04974-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Clouard, Camille
Ausmees, Kristiina
Nettelblad, Carl
A joint use of pooling and imputation for genotyping SNPs
title A joint use of pooling and imputation for genotyping SNPs
title_full A joint use of pooling and imputation for genotyping SNPs
title_fullStr A joint use of pooling and imputation for genotyping SNPs
title_full_unstemmed A joint use of pooling and imputation for genotyping SNPs
title_short A joint use of pooling and imputation for genotyping SNPs
title_sort joint use of pooling and imputation for genotyping snps
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563787/
https://www.ncbi.nlm.nih.gov/pubmed/36229780
http://dx.doi.org/10.1186/s12859-022-04974-7
work_keys_str_mv AT clouardcamille ajointuseofpoolingandimputationforgenotypingsnps
AT ausmeeskristiina ajointuseofpoolingandimputationforgenotypingsnps
AT nettelbladcarl ajointuseofpoolingandimputationforgenotypingsnps
AT clouardcamille jointuseofpoolingandimputationforgenotypingsnps
AT ausmeeskristiina jointuseofpoolingandimputationforgenotypingsnps
AT nettelbladcarl jointuseofpoolingandimputationforgenotypingsnps