Cargando…
Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing
Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for qua...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9709082/ https://www.ncbi.nlm.nih.gov/pubmed/36465187 http://dx.doi.org/10.1016/j.xhgg.2022.100159 |
_version_ | 1784841067550474240 |
---|---|
author | Sheng, Xin Xia, Lucy Cahoon, Jordan L. Conti, David V. Haiman, Christopher A. Kachuri, Linda Chiang, Charleston W.K. |
author_facet | Sheng, Xin Xia, Lucy Cahoon, Jordan L. Conti, David V. Haiman, Christopher A. Kachuri, Linda Chiang, Charleston W.K. |
author_sort | Sheng, Xin |
collection | PubMed |
description | Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that single-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often mishandled bioinformatically. Depending on the array type, SNVs are found in approximately 2–5 Mb of the genome that are inverted between reference builds. Coordinate conversions of these variants are mishandled by both the TOPMed imputation server as well as routine in-house quality control pipelines, leading to underrecognized downstream analytical consequences. Specifically, we observe that undetected allelic conversion errors for palindromic (i.e., A/T or C/G) variants in these inverted regions would destabilize the local haplotype structure, leading to loss of imputation accuracy and power in association analyses. Though only a small proportion of the genome is affected, these regions include important disease susceptibility variants that would be affected. For example, the p value of a known locus associated with prostate cancer on chromosome 10 (chr10) would drop from 2.86 × 10(−7) to 0.0011 in a case-control analysis of 20,286 Africans and African Americans (10,643 cases and 9,643 controls). We devise a straight-forward heuristic based on the popular tool, liftOver, that can easily detect and correct these variants in the inverted regions between genome builds to locally improve imputation accuracy. |
format | Online Article Text |
id | pubmed-9709082 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-97090822022-12-01 Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing Sheng, Xin Xia, Lucy Cahoon, Jordan L. Conti, David V. Haiman, Christopher A. Kachuri, Linda Chiang, Charleston W.K. HGG Adv Article Over the last two decades, the human reference genome has undergone multiple updates as we complete a linear representation of our genome. Two versions of human references are currently used in the biomedical literature, GRCh37/hg19 and GRCh38. Conversions between these versions are critical for quality control, imputation, and association analysis. In the present study, we show that single-nucleotide variants (SNVs) in regions inverted between different builds of the reference genome are often mishandled bioinformatically. Depending on the array type, SNVs are found in approximately 2–5 Mb of the genome that are inverted between reference builds. Coordinate conversions of these variants are mishandled by both the TOPMed imputation server as well as routine in-house quality control pipelines, leading to underrecognized downstream analytical consequences. Specifically, we observe that undetected allelic conversion errors for palindromic (i.e., A/T or C/G) variants in these inverted regions would destabilize the local haplotype structure, leading to loss of imputation accuracy and power in association analyses. Though only a small proportion of the genome is affected, these regions include important disease susceptibility variants that would be affected. For example, the p value of a known locus associated with prostate cancer on chromosome 10 (chr10) would drop from 2.86 × 10(−7) to 0.0011 in a case-control analysis of 20,286 Africans and African Americans (10,643 cases and 9,643 controls). We devise a straight-forward heuristic based on the popular tool, liftOver, that can easily detect and correct these variants in the inverted regions between genome builds to locally improve imputation accuracy. Elsevier 2022-11-11 /pmc/articles/PMC9709082/ /pubmed/36465187 http://dx.doi.org/10.1016/j.xhgg.2022.100159 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Sheng, Xin Xia, Lucy Cahoon, Jordan L. Conti, David V. Haiman, Christopher A. Kachuri, Linda Chiang, Charleston W.K. Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing |
title | Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing |
title_full | Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing |
title_fullStr | Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing |
title_full_unstemmed | Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing |
title_short | Inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing |
title_sort | inverted genomic regions between reference genome builds in humans impact imputation accuracy and decrease the power of association testing |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9709082/ https://www.ncbi.nlm.nih.gov/pubmed/36465187 http://dx.doi.org/10.1016/j.xhgg.2022.100159 |
work_keys_str_mv | AT shengxin invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting AT xialucy invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting AT cahoonjordanl invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting AT contidavidv invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting AT haimanchristophera invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting AT kachurilinda invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting AT chiangcharlestonwk invertedgenomicregionsbetweenreferencegenomebuildsinhumansimpactimputationaccuracyanddecreasethepowerofassociationtesting |