Cargando…

A joint use of pooling and imputation for genotyping SNPs

BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount...

Descripción completa

Detalles Bibliográficos
Autores principales:	Clouard, Camille, Ausmees, Kristiina, Nettelblad, Carl
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563787/ https://www.ncbi.nlm.nih.gov/pubmed/36229780 http://dx.doi.org/10.1186/s12859-022-04974-7

_version_	1784808486766379008
author	Clouard, Camille Ausmees, Kristiina Nettelblad, Carl
author_facet	Clouard, Camille Ausmees, Kristiina Nettelblad, Carl
author_sort	Clouard, Camille
collection	PubMed
description	BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04974-7.
format	Online Article Text
id	pubmed-9563787
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-95637872022-10-15 A joint use of pooling and imputation for genotyping SNPs Clouard, Camille Ausmees, Kristiina Nettelblad, Carl BMC Bioinformatics Research BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04974-7. BioMed Central 2022-10-13 /pmc/articles/PMC9563787/ /pubmed/36229780 http://dx.doi.org/10.1186/s12859-022-04974-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Clouard, Camille Ausmees, Kristiina Nettelblad, Carl A joint use of pooling and imputation for genotyping SNPs
title	A joint use of pooling and imputation for genotyping SNPs
title_full	A joint use of pooling and imputation for genotyping SNPs
title_fullStr	A joint use of pooling and imputation for genotyping SNPs
title_full_unstemmed	A joint use of pooling and imputation for genotyping SNPs
title_short	A joint use of pooling and imputation for genotyping SNPs
title_sort	joint use of pooling and imputation for genotyping snps
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563787/ https://www.ncbi.nlm.nih.gov/pubmed/36229780 http://dx.doi.org/10.1186/s12859-022-04974-7
work_keys_str_mv	AT clouardcamille ajointuseofpoolingandimputationforgenotypingsnps AT ausmeeskristiina ajointuseofpoolingandimputationforgenotypingsnps AT nettelbladcarl ajointuseofpoolingandimputationforgenotypingsnps AT clouardcamille jointuseofpoolingandimputationforgenotypingsnps AT ausmeeskristiina jointuseofpoolingandimputationforgenotypingsnps AT nettelbladcarl jointuseofpoolingandimputationforgenotypingsnps

A joint use of pooling and imputation for genotyping SNPs

Ejemplares similares