Cargando…
A joint use of pooling and imputation for genotyping SNPs
BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563787/ https://www.ncbi.nlm.nih.gov/pubmed/36229780 http://dx.doi.org/10.1186/s12859-022-04974-7 |
_version_ | 1784808486766379008 |
---|---|
author | Clouard, Camille Ausmees, Kristiina Nettelblad, Carl |
author_facet | Clouard, Camille Ausmees, Kristiina Nettelblad, Carl |
author_sort | Clouard, Camille |
collection | PubMed |
description | BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04974-7. |
format | Online Article Text |
id | pubmed-9563787 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95637872022-10-15 A joint use of pooling and imputation for genotyping SNPs Clouard, Camille Ausmees, Kristiina Nettelblad, Carl BMC Bioinformatics Research BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04974-7. BioMed Central 2022-10-13 /pmc/articles/PMC9563787/ /pubmed/36229780 http://dx.doi.org/10.1186/s12859-022-04974-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Clouard, Camille Ausmees, Kristiina Nettelblad, Carl A joint use of pooling and imputation for genotyping SNPs |
title | A joint use of pooling and imputation for genotyping SNPs |
title_full | A joint use of pooling and imputation for genotyping SNPs |
title_fullStr | A joint use of pooling and imputation for genotyping SNPs |
title_full_unstemmed | A joint use of pooling and imputation for genotyping SNPs |
title_short | A joint use of pooling and imputation for genotyping SNPs |
title_sort | joint use of pooling and imputation for genotyping snps |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9563787/ https://www.ncbi.nlm.nih.gov/pubmed/36229780 http://dx.doi.org/10.1186/s12859-022-04974-7 |
work_keys_str_mv | AT clouardcamille ajointuseofpoolingandimputationforgenotypingsnps AT ausmeeskristiina ajointuseofpoolingandimputationforgenotypingsnps AT nettelbladcarl ajointuseofpoolingandimputationforgenotypingsnps AT clouardcamille jointuseofpoolingandimputationforgenotypingsnps AT ausmeeskristiina jointuseofpoolingandimputationforgenotypingsnps AT nettelbladcarl jointuseofpoolingandimputationforgenotypingsnps |