Cargando…

Similarities and differences between variants called with human reference genome HG19 or HG38

BACKGROUND: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigor...

Descripción completa

Detalles Bibliográficos
Autores principales: Pan, Bohu, Kusko, Rebecca, Xiao, Wenming, Zheng, Yuanting, Liu, Zhichao, Xiao, Chunlin, Sakkiah, Sugunadevi, Guo, Wenjing, Gong, Ping, Zhang, Chaoyang, Ge, Weigong, Shi, Leming, Tong, Weida, Hong, Huixiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419332/
https://www.ncbi.nlm.nih.gov/pubmed/30871461
http://dx.doi.org/10.1186/s12859-019-2620-0
_version_ 1783403920421289984
author Pan, Bohu
Kusko, Rebecca
Xiao, Wenming
Zheng, Yuanting
Liu, Zhichao
Xiao, Chunlin
Sakkiah, Sugunadevi
Guo, Wenjing
Gong, Ping
Zhang, Chaoyang
Ge, Weigong
Shi, Leming
Tong, Weida
Hong, Huixiao
author_facet Pan, Bohu
Kusko, Rebecca
Xiao, Wenming
Zheng, Yuanting
Liu, Zhichao
Xiao, Chunlin
Sakkiah, Sugunadevi
Guo, Wenjing
Gong, Ping
Zhang, Chaoyang
Ge, Weigong
Shi, Leming
Tong, Weida
Hong, Huixiao
author_sort Pan, Bohu
collection PubMed
description BACKGROUND: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. METHODS: We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. RESULTS: The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). CONCLUSION: A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2620-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6419332
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64193322019-03-27 Similarities and differences between variants called with human reference genome HG19 or HG38 Pan, Bohu Kusko, Rebecca Xiao, Wenming Zheng, Yuanting Liu, Zhichao Xiao, Chunlin Sakkiah, Sugunadevi Guo, Wenjing Gong, Ping Zhang, Chaoyang Ge, Weigong Shi, Leming Tong, Weida Hong, Huixiao BMC Bioinformatics Research BACKGROUND: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. METHODS: We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. RESULTS: The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). CONCLUSION: A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2620-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-14 /pmc/articles/PMC6419332/ /pubmed/30871461 http://dx.doi.org/10.1186/s12859-019-2620-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Pan, Bohu
Kusko, Rebecca
Xiao, Wenming
Zheng, Yuanting
Liu, Zhichao
Xiao, Chunlin
Sakkiah, Sugunadevi
Guo, Wenjing
Gong, Ping
Zhang, Chaoyang
Ge, Weigong
Shi, Leming
Tong, Weida
Hong, Huixiao
Similarities and differences between variants called with human reference genome HG19 or HG38
title Similarities and differences between variants called with human reference genome HG19 or HG38
title_full Similarities and differences between variants called with human reference genome HG19 or HG38
title_fullStr Similarities and differences between variants called with human reference genome HG19 or HG38
title_full_unstemmed Similarities and differences between variants called with human reference genome HG19 or HG38
title_short Similarities and differences between variants called with human reference genome HG19 or HG38
title_sort similarities and differences between variants called with human reference genome hg19 or hg38
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419332/
https://www.ncbi.nlm.nih.gov/pubmed/30871461
http://dx.doi.org/10.1186/s12859-019-2620-0
work_keys_str_mv AT panbohu similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT kuskorebecca similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT xiaowenming similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT zhengyuanting similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT liuzhichao similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT xiaochunlin similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT sakkiahsugunadevi similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT guowenjing similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT gongping similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT zhangchaoyang similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT geweigong similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT shileming similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT tongweida similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38
AT honghuixiao similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38