Cargando…

Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance

Genotype imputation is widely used to enrich genetic datasets. The operation relies on panels of known reference haplotypes, typically with whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individual...

Descripción completa

Detalles Bibliográficos
Autores principales: Dekeyser, Thibault, Génin, Emmanuelle, Herzig, Anthony F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9956390/
https://www.ncbi.nlm.nih.gov/pubmed/36833337
http://dx.doi.org/10.3390/genes14020410
_version_ 1784894579533676544
author Dekeyser, Thibault
Génin, Emmanuelle
Herzig, Anthony F.
author_facet Dekeyser, Thibault
Génin, Emmanuelle
Herzig, Anthony F.
author_sort Dekeyser, Thibault
collection PubMed
description Genotype imputation is widely used to enrich genetic datasets. The operation relies on panels of known reference haplotypes, typically with whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individuals who require missing genotype imputation. However, it is broadly accepted that such an imputation panel will have an enhanced performance with the inclusion of diversity (haplotypes from many different populations). We investigate this observation by examining, in fine detail, exactly which reference haplotypes are contributing at different regions of the genome. This is achieved using a novel method of inserting synthetic genetic variation into the reference panel in order to track the performance of leading imputation algorithms. We show that while diversity may globally improve imputation accuracy, there can be occasions where incorrect genotypes are imputed following the inclusion of more diverse haplotypes in the reference panel. We, however, demonstrate a technique for retaining and benefitting from the diversity in the reference panel whilst avoiding the occasional adverse effects on imputation accuracy. What is more, our results more clearly elucidate the role of diversity in a reference panel than has been shown in previous studies.
format Online
Article
Text
id pubmed-9956390
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99563902023-02-25 Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance Dekeyser, Thibault Génin, Emmanuelle Herzig, Anthony F. Genes (Basel) Article Genotype imputation is widely used to enrich genetic datasets. The operation relies on panels of known reference haplotypes, typically with whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individuals who require missing genotype imputation. However, it is broadly accepted that such an imputation panel will have an enhanced performance with the inclusion of diversity (haplotypes from many different populations). We investigate this observation by examining, in fine detail, exactly which reference haplotypes are contributing at different regions of the genome. This is achieved using a novel method of inserting synthetic genetic variation into the reference panel in order to track the performance of leading imputation algorithms. We show that while diversity may globally improve imputation accuracy, there can be occasions where incorrect genotypes are imputed following the inclusion of more diverse haplotypes in the reference panel. We, however, demonstrate a technique for retaining and benefitting from the diversity in the reference panel whilst avoiding the occasional adverse effects on imputation accuracy. What is more, our results more clearly elucidate the role of diversity in a reference panel than has been shown in previous studies. MDPI 2023-02-04 /pmc/articles/PMC9956390/ /pubmed/36833337 http://dx.doi.org/10.3390/genes14020410 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dekeyser, Thibault
Génin, Emmanuelle
Herzig, Anthony F.
Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_full Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_fullStr Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_full_unstemmed Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_short Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance
title_sort opening the black box of imputation software to study the impact of reference panel composition on performance
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9956390/
https://www.ncbi.nlm.nih.gov/pubmed/36833337
http://dx.doi.org/10.3390/genes14020410
work_keys_str_mv AT dekeyserthibault openingtheblackboxofimputationsoftwaretostudytheimpactofreferencepanelcompositiononperformance
AT geninemmanuelle openingtheblackboxofimputationsoftwaretostudytheimpactofreferencepanelcompositiononperformance
AT herziganthonyf openingtheblackboxofimputationsoftwaretostudytheimpactofreferencepanelcompositiononperformance