Cargando…

Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools

BACKGROUND: Imputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants’ imputation in an admixed Caribbean Hispanic po...

Descripción completa

Detalles Bibliográficos
Autores principales: Sariya, Sanjeev, Lee, Joseph H., Mayeux, Richard, Vardarajan, Badri N., Reyes-Dumeyer, Dolly, Manly, Jennifer J., Brickman, Adam M., Lantigua, Rafael, Medrano, Martin, Jimenez-Velazquez, Ivonne Z., Tosto, Giuseppe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456789/
https://www.ncbi.nlm.nih.gov/pubmed/31001313
http://dx.doi.org/10.3389/fgene.2019.00239
_version_ 1783409810799067136
author Sariya, Sanjeev
Lee, Joseph H.
Mayeux, Richard
Vardarajan, Badri N.
Reyes-Dumeyer, Dolly
Manly, Jennifer J.
Brickman, Adam M.
Lantigua, Rafael
Medrano, Martin
Jimenez-Velazquez, Ivonne Z.
Tosto, Giuseppe
author_facet Sariya, Sanjeev
Lee, Joseph H.
Mayeux, Richard
Vardarajan, Badri N.
Reyes-Dumeyer, Dolly
Manly, Jennifer J.
Brickman, Adam M.
Lantigua, Rafael
Medrano, Martin
Jimenez-Velazquez, Ivonne Z.
Tosto, Giuseppe
author_sort Sariya, Sanjeev
collection PubMed
description BACKGROUND: Imputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants’ imputation in an admixed Caribbean Hispanic population (CH). METHODS: We evaluated imputation accuracy in CH (N = 1,000), focusing on rare (0.1% ≤ minor allele frequency (MAF) ≤ 1%) and ultra-rare (MAF < 0.1%) variants. We used two reference panels, the Haplotype Reference Consortium (HRC; N = 27,165) and 1000 Genome Project (1000G phase 3; N = 2,504) and multiple phasing (SHAPEIT, Eagle2) and imputation algorithms (IMPUTE2, MACH-Admix). To assess imputation quality, we reported: (a) high-quality variant counts according to imputation tools’ internal indexes (e.g., IMPUTE2 “Info” ≥ 80%). (b) Wilcoxon Signed-Rank Test comparing imputation quality for genotyped variants that were masked and imputed; (c) Cohen’s kappa coefficient to test agreement between imputed and whole-exome sequencing (WES) variants; (d) imputation of G206A mutation in the PSEN1 (ultra-rare in the general population an more frequent in CH) followed by confirmation genotyping. We also tested ancestry proportion (European, African and Native American) against WES-imputation mismatches in a Poisson regression fashion. RESULTS: SHAPEIT2 retrieved higher percentage of imputed high-quality variants than Eagle2 (rare: 51.02% vs. 48.60%; ultra-rare 0.66% vs. 0.65%, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 employing HRC outperformed 1000G (64.50% vs. 59.17%; 1.69% vs. 0.75% for high-quality rare and ultra-rare variants, respectively, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 outperformed MaCH-Admix. Compared to 1000G, HRC-imputation retrieved a higher number of high-quality rare and ultra-rare variants, despite showing lower agreement between imputed and WES variants (e.g., rare: 98.86% for HRC vs. 99.02% for 1000G). High Kappa (K = 0.99) was observed for both reference panels. Twelve G206A mutation carriers were imputed and all validated by confirmation genotyping. African ancestry was associated with higher imputation errors for uncommon and rare variants (p-value < 1e-05). CONCLUSION: Reference panels with larger numbers of haplotypes can improve imputation quality for rare and ultra-rare variants in admixed populations such as CH. Ethnic composition is an important predictor of imputation accuracy, with higher African ancestry associated with poorer imputation accuracy.
format Online
Article
Text
id pubmed-6456789
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-64567892019-04-18 Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools Sariya, Sanjeev Lee, Joseph H. Mayeux, Richard Vardarajan, Badri N. Reyes-Dumeyer, Dolly Manly, Jennifer J. Brickman, Adam M. Lantigua, Rafael Medrano, Martin Jimenez-Velazquez, Ivonne Z. Tosto, Giuseppe Front Genet Genetics BACKGROUND: Imputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants’ imputation in an admixed Caribbean Hispanic population (CH). METHODS: We evaluated imputation accuracy in CH (N = 1,000), focusing on rare (0.1% ≤ minor allele frequency (MAF) ≤ 1%) and ultra-rare (MAF < 0.1%) variants. We used two reference panels, the Haplotype Reference Consortium (HRC; N = 27,165) and 1000 Genome Project (1000G phase 3; N = 2,504) and multiple phasing (SHAPEIT, Eagle2) and imputation algorithms (IMPUTE2, MACH-Admix). To assess imputation quality, we reported: (a) high-quality variant counts according to imputation tools’ internal indexes (e.g., IMPUTE2 “Info” ≥ 80%). (b) Wilcoxon Signed-Rank Test comparing imputation quality for genotyped variants that were masked and imputed; (c) Cohen’s kappa coefficient to test agreement between imputed and whole-exome sequencing (WES) variants; (d) imputation of G206A mutation in the PSEN1 (ultra-rare in the general population an more frequent in CH) followed by confirmation genotyping. We also tested ancestry proportion (European, African and Native American) against WES-imputation mismatches in a Poisson regression fashion. RESULTS: SHAPEIT2 retrieved higher percentage of imputed high-quality variants than Eagle2 (rare: 51.02% vs. 48.60%; ultra-rare 0.66% vs. 0.65%, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 employing HRC outperformed 1000G (64.50% vs. 59.17%; 1.69% vs. 0.75% for high-quality rare and ultra-rare variants, respectively, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 outperformed MaCH-Admix. Compared to 1000G, HRC-imputation retrieved a higher number of high-quality rare and ultra-rare variants, despite showing lower agreement between imputed and WES variants (e.g., rare: 98.86% for HRC vs. 99.02% for 1000G). High Kappa (K = 0.99) was observed for both reference panels. Twelve G206A mutation carriers were imputed and all validated by confirmation genotyping. African ancestry was associated with higher imputation errors for uncommon and rare variants (p-value < 1e-05). CONCLUSION: Reference panels with larger numbers of haplotypes can improve imputation quality for rare and ultra-rare variants in admixed populations such as CH. Ethnic composition is an important predictor of imputation accuracy, with higher African ancestry associated with poorer imputation accuracy. Frontiers Media S.A. 2019-04-03 /pmc/articles/PMC6456789/ /pubmed/31001313 http://dx.doi.org/10.3389/fgene.2019.00239 Text en Copyright © 2019 Sariya, Lee, Mayeux, Vardarajan, Reyes-Dumeyer, Manly, Brickman, Lantigua, Medrano, Jimenez-Velazquez and Tosto. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Sariya, Sanjeev
Lee, Joseph H.
Mayeux, Richard
Vardarajan, Badri N.
Reyes-Dumeyer, Dolly
Manly, Jennifer J.
Brickman, Adam M.
Lantigua, Rafael
Medrano, Martin
Jimenez-Velazquez, Ivonne Z.
Tosto, Giuseppe
Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools
title Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools
title_full Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools
title_fullStr Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools
title_full_unstemmed Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools
title_short Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools
title_sort rare variants imputation in admixed populations: comparison across reference panels and bioinformatics tools
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6456789/
https://www.ncbi.nlm.nih.gov/pubmed/31001313
http://dx.doi.org/10.3389/fgene.2019.00239
work_keys_str_mv AT sariyasanjeev rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT leejosephh rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT mayeuxrichard rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT vardarajanbadrin rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT reyesdumeyerdolly rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT manlyjenniferj rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT brickmanadamm rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT lantiguarafael rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT medranomartin rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT jimenezvelazquezivonnez rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools
AT tostogiuseppe rarevariantsimputationinadmixedpopulationscomparisonacrossreferencepanelsandbioinformaticstools