Cargando…

Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing

Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for qua...

Descripción completa

Detalles Bibliográficos
Autores principales: Sheng, Xin, Xia, Lucy, Cahoon, Jordan L., Conti, David V., Haiman, Christopher A., Kachuri, Linda, Chiang, Charleston W.K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9709082/
https://www.ncbi.nlm.nih.gov/pubmed/36465187
http://dx.doi.org/10.1016/j.xhgg.2022.100159
_version_ 1784841067550474240
author Sheng, Xin
Xia, Lucy
Cahoon, Jordan L.
Conti, David V.
Haiman, Christopher A.
Kachuri, Linda
Chiang, Charleston W.K.
author_facet Sheng, Xin
Xia, Lucy
Cahoon, Jordan L.
Conti, David V.
Haiman, Christopher A.
Kachuri, Linda
Chiang, Charleston W.K.
author_sort Sheng, Xin
collection PubMed
description Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that single-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often mishandled bioinformatically. Depending on the array type, SNVs are found in approximately 2–5 Mb of the genome that are inverted between reference builds. Coordinate conversions of these variants are mishandled by both the TOPMed imputation server as well as routine in-house quality control pipelines, leading to underrecognized downstream analytical consequences. Specifically, we observe that undetected allelic conversion errors for palindromic (i.e., A/T or C/G) variants in these inverted regions would destabilize the local haplotype structure, leading to loss of imputation accuracy and power in association analyses. Though only a small proportion of the genome is affected, these regions include important disease susceptibility variants that would be affected. For example, the p value of a known locus associated with prostate cancer on chromosome 10 (chr10) would drop from 2.86 × 10(−7) to 0.0011 in a case-control analysis of 20,286 Africans and African Americans (10,643 cases and 9,643 controls). We devise a straight-forward heuristic based on the popular tool, liftOver, that can easily detect and correct these variants in the inverted regions between genome builds to locally improve imputation accuracy.
format Online
Article
Text
id pubmed-9709082
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-97090822022-12-01 Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing Sheng, Xin Xia, Lucy Cahoon, Jordan L. Conti, David V. Haiman, Christopher A. Kachuri, Linda Chiang, Charleston W.K. HGG Adv Article Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that single-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often mishandled bioinformatically. Depending on the array type, SNVs are found in approximately 2–5 Mb of the genome that are inverted between reference builds. Coordinate conversions of these variants are mishandled by both the TOPMed imputation server as well as routine in-house quality control pipelines, leading to underrecognized downstream analytical consequences. Specifically, we observe that undetected allelic conversion errors for palindromic (i.e., A/T or C/G) variants in these inverted regions would destabilize the local haplotype structure, leading to loss of imputation accuracy and power in association analyses. Though only a small proportion of the genome is affected, these regions include important disease susceptibility variants that would be affected. For example, the p value of a known locus associated with prostate cancer on chromosome 10 (chr10) would drop from 2.86 × 10(−7) to 0.0011 in a case-control analysis of 20,286 Africans and African Americans (10,643 cases and 9,643 controls). We devise a straight-forward heuristic based on the popular tool, liftOver, that can easily detect and correct these variants in the inverted regions between genome builds to locally improve imputation accuracy. Elsevier 2022-11-11 /pmc/articles/PMC9709082/ /pubmed/36465187 http://dx.doi.org/10.1016/j.xhgg.2022.100159 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Sheng, Xin
Xia, Lucy
Cahoon, Jordan L.
Conti, David V.
Haiman, Christopher A.
Kachuri, Linda
Chiang, Charleston W.K.
Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing
title Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing
title_full Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing
title_fullStr Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing
title_full_unstemmed Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing
title_short Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing
title_sort inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9709082/
https://www.ncbi.nlm.nih.gov/pubmed/36465187
http://dx.doi.org/10.1016/j.xhgg.2022.100159
work_keys_str_mv AT shengxin invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting
AT xialucy invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting
AT cahoonjordanl invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting
AT contidavidv invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting
AT haimanchristophera invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting
AT kachurilinda invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting
AT chiangcharlestonwk invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting