Cargando…
A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390539/ https://www.ncbi.nlm.nih.gov/pubmed/37524845 http://dx.doi.org/10.1038/s41598-023-39429-3 |
_version_ | 1785082498401697792 |
---|---|
author | Mauleekoonphairoj, John Tongsima, Sissades Khongphatthanayothin, Apichai Jurgens, Sean J. Zimmerman, Dominic S. Sutjaporn, Boosamas Wandee, Pharawee Bezzina, Connie R. Nademanee, Koonlawee Poovorawan, Yong |
author_facet | Mauleekoonphairoj, John Tongsima, Sissades Khongphatthanayothin, Apichai Jurgens, Sean J. Zimmerman, Dominic S. Sutjaporn, Boosamas Wandee, Pharawee Bezzina, Connie R. Nademanee, Koonlawee Poovorawan, Yong |
author_sort | Mauleekoonphairoj, John |
collection | PubMed |
description | Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding the performance of public reference panels when used in an imputation of populations underrepresented in the reference panel. Here, we compare the performance of various public reference panels: 1000 Genomes Project, Haplotype Reference Consortium, GenomeAsia 100 K, and the recent Trans-Omics for Precision Medicine (TOPMed) program, when used in an imputation of samples from the Thai population. Genotype yields were assessed, and imputation accuracies were examined by comparison with high-depth whole genome sequencing data of the same sample. We found that imputation using the TOPMed panel yielded the largest number of variants (~ 271 million). Despite being the smallest in size, GenomeAsia 100 K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. For rare variants, GenomeAsia 100 K also offered the best accuracy, although rare variants were less accurately imputable than common variants (30.3% reduction in concordance rates). The high accuracy observed when using GenomeAsia 100 K is likely attributable to the diverse representation of populations genetically similar to the study cohort emphasizing the benefits of sequencing populations classically underrepresented in human genomics. |
format | Online Article Text |
id | pubmed-10390539 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-103905392023-08-02 A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population Mauleekoonphairoj, John Tongsima, Sissades Khongphatthanayothin, Apichai Jurgens, Sean J. Zimmerman, Dominic S. Sutjaporn, Boosamas Wandee, Pharawee Bezzina, Connie R. Nademanee, Koonlawee Poovorawan, Yong Sci Rep Article Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding the performance of public reference panels when used in an imputation of populations underrepresented in the reference panel. Here, we compare the performance of various public reference panels: 1000 Genomes Project, Haplotype Reference Consortium, GenomeAsia 100 K, and the recent Trans-Omics for Precision Medicine (TOPMed) program, when used in an imputation of samples from the Thai population. Genotype yields were assessed, and imputation accuracies were examined by comparison with high-depth whole genome sequencing data of the same sample. We found that imputation using the TOPMed panel yielded the largest number of variants (~ 271 million). Despite being the smallest in size, GenomeAsia 100 K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. For rare variants, GenomeAsia 100 K also offered the best accuracy, although rare variants were less accurately imputable than common variants (30.3% reduction in concordance rates). The high accuracy observed when using GenomeAsia 100 K is likely attributable to the diverse representation of populations genetically similar to the study cohort emphasizing the benefits of sequencing populations classically underrepresented in human genomics. Nature Publishing Group UK 2023-07-31 /pmc/articles/PMC10390539/ /pubmed/37524845 http://dx.doi.org/10.1038/s41598-023-39429-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Mauleekoonphairoj, John Tongsima, Sissades Khongphatthanayothin, Apichai Jurgens, Sean J. Zimmerman, Dominic S. Sutjaporn, Boosamas Wandee, Pharawee Bezzina, Connie R. Nademanee, Koonlawee Poovorawan, Yong A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_full | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_fullStr | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_full_unstemmed | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_short | A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
title_sort | diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390539/ https://www.ncbi.nlm.nih.gov/pubmed/37524845 http://dx.doi.org/10.1038/s41598-023-39429-3 |
work_keys_str_mv | AT mauleekoonphairojjohn adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT tongsimasissades adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT khongphatthanayothinapichai adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT jurgensseanj adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT zimmermandominics adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT sutjapornboosamas adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT wandeepharawee adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT bezzinaconnier adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT nademaneekoonlawee adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT poovorawanyong adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT mauleekoonphairojjohn diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT tongsimasissades diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT khongphatthanayothinapichai diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT jurgensseanj diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT zimmermandominics diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT sutjapornboosamas diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT wandeepharawee diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT bezzinaconnier diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT nademaneekoonlawee diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation AT poovorawanyong diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation |