Cargando…

A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population

Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding...

Descripción completa

Detalles Bibliográficos
Autores principales: Mauleekoonphairoj, John, Tongsima, Sissades, Khongphatthanayothin, Apichai, Jurgens, Sean J., Zimmerman, Dominic S., Sutjaporn, Boosamas, Wandee, Pharawee, Bezzina, Connie R., Nademanee, Koonlawee, Poovorawan, Yong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390539/
https://www.ncbi.nlm.nih.gov/pubmed/37524845
http://dx.doi.org/10.1038/s41598-023-39429-3
_version_ 1785082498401697792
author Mauleekoonphairoj, John
Tongsima, Sissades
Khongphatthanayothin, Apichai
Jurgens, Sean J.
Zimmerman, Dominic S.
Sutjaporn, Boosamas
Wandee, Pharawee
Bezzina, Connie R.
Nademanee, Koonlawee
Poovorawan, Yong
author_facet Mauleekoonphairoj, John
Tongsima, Sissades
Khongphatthanayothin, Apichai
Jurgens, Sean J.
Zimmerman, Dominic S.
Sutjaporn, Boosamas
Wandee, Pharawee
Bezzina, Connie R.
Nademanee, Koonlawee
Poovorawan, Yong
author_sort Mauleekoonphairoj, John
collection PubMed
description Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding the performance of public reference panels when used in an imputation of populations underrepresented in the reference panel. Here, we compare the performance of various public reference panels: 1000 Genomes Project, Haplotype Reference Consortium, GenomeAsia 100 K, and the recent Trans-Omics for Precision Medicine (TOPMed) program, when used in an imputation of samples from the Thai population. Genotype yields were assessed, and imputation accuracies were examined by comparison with high-depth whole genome sequencing data of the same sample. We found that imputation using the TOPMed panel yielded the largest number of variants (~ 271 million). Despite being the smallest in size, GenomeAsia 100 K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. For rare variants, GenomeAsia 100 K also offered the best accuracy, although rare variants were less accurately imputable than common variants (30.3% reduction in concordance rates). The high accuracy observed when using GenomeAsia 100 K is likely attributable to the diverse representation of populations genetically similar to the study cohort emphasizing the benefits of sequencing populations classically underrepresented in human genomics.
format Online
Article
Text
id pubmed-10390539
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-103905392023-08-02 A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population Mauleekoonphairoj, John Tongsima, Sissades Khongphatthanayothin, Apichai Jurgens, Sean J. Zimmerman, Dominic S. Sutjaporn, Boosamas Wandee, Pharawee Bezzina, Connie R. Nademanee, Koonlawee Poovorawan, Yong Sci Rep Article Variant imputation, a common practice in genome-wide association studies, relies on reference panels to infer unobserved genotypes. Multiple public reference panels are currently available with variations in size, sequencing depth, and represented populations. Currently, limited data exist regarding the performance of public reference panels when used in an imputation of populations underrepresented in the reference panel. Here, we compare the performance of various public reference panels: 1000 Genomes Project, Haplotype Reference Consortium, GenomeAsia 100 K, and the recent Trans-Omics for Precision Medicine (TOPMed) program, when used in an imputation of samples from the Thai population. Genotype yields were assessed, and imputation accuracies were examined by comparison with high-depth whole genome sequencing data of the same sample. We found that imputation using the TOPMed panel yielded the largest number of variants (~ 271 million). Despite being the smallest in size, GenomeAsia 100 K achieved the best imputation accuracy with a median genotype concordance rate of 0.97. For rare variants, GenomeAsia 100 K also offered the best accuracy, although rare variants were less accurately imputable than common variants (30.3% reduction in concordance rates). The high accuracy observed when using GenomeAsia 100 K is likely attributable to the diverse representation of populations genetically similar to the study cohort emphasizing the benefits of sequencing populations classically underrepresented in human genomics. Nature Publishing Group UK 2023-07-31 /pmc/articles/PMC10390539/ /pubmed/37524845 http://dx.doi.org/10.1038/s41598-023-39429-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Mauleekoonphairoj, John
Tongsima, Sissades
Khongphatthanayothin, Apichai
Jurgens, Sean J.
Zimmerman, Dominic S.
Sutjaporn, Boosamas
Wandee, Pharawee
Bezzina, Connie R.
Nademanee, Koonlawee
Poovorawan, Yong
A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_full A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_fullStr A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_full_unstemmed A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_short A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
title_sort diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390539/
https://www.ncbi.nlm.nih.gov/pubmed/37524845
http://dx.doi.org/10.1038/s41598-023-39429-3
work_keys_str_mv AT mauleekoonphairojjohn adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT tongsimasissades adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT khongphatthanayothinapichai adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT jurgensseanj adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT zimmermandominics adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT sutjapornboosamas adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT wandeepharawee adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT bezzinaconnier adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT nademaneekoonlawee adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT poovorawanyong adiverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT mauleekoonphairojjohn diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT tongsimasissades diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT khongphatthanayothinapichai diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT jurgensseanj diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT zimmermandominics diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT sutjapornboosamas diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT wandeepharawee diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT bezzinaconnier diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT nademaneekoonlawee diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation
AT poovorawanyong diverseancestrallymatchedreferencepanelincreasesgenotypeimputationaccuracyinaunderrepresentedpopulation