Cargando…
Similarities and differences between variants called with human reference genome HG19 or HG38
BACKGROUND: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigor...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419332/ https://www.ncbi.nlm.nih.gov/pubmed/30871461 http://dx.doi.org/10.1186/s12859-019-2620-0 |
_version_ | 1783403920421289984 |
---|---|
author | Pan, Bohu Kusko, Rebecca Xiao, Wenming Zheng, Yuanting Liu, Zhichao Xiao, Chunlin Sakkiah, Sugunadevi Guo, Wenjing Gong, Ping Zhang, Chaoyang Ge, Weigong Shi, Leming Tong, Weida Hong, Huixiao |
author_facet | Pan, Bohu Kusko, Rebecca Xiao, Wenming Zheng, Yuanting Liu, Zhichao Xiao, Chunlin Sakkiah, Sugunadevi Guo, Wenjing Gong, Ping Zhang, Chaoyang Ge, Weigong Shi, Leming Tong, Weida Hong, Huixiao |
author_sort | Pan, Bohu |
collection | PubMed |
description | BACKGROUND: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. METHODS: We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. RESULTS: The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). CONCLUSION: A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2620-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6419332 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64193322019-03-27 Similarities and differences between variants called with human reference genome HG19 or HG38 Pan, Bohu Kusko, Rebecca Xiao, Wenming Zheng, Yuanting Liu, Zhichao Xiao, Chunlin Sakkiah, Sugunadevi Guo, Wenjing Gong, Ping Zhang, Chaoyang Ge, Weigong Shi, Leming Tong, Weida Hong, Huixiao BMC Bioinformatics Research BACKGROUND: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. METHODS: We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. RESULTS: The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected). CONCLUSION: A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2620-0) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-14 /pmc/articles/PMC6419332/ /pubmed/30871461 http://dx.doi.org/10.1186/s12859-019-2620-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Pan, Bohu Kusko, Rebecca Xiao, Wenming Zheng, Yuanting Liu, Zhichao Xiao, Chunlin Sakkiah, Sugunadevi Guo, Wenjing Gong, Ping Zhang, Chaoyang Ge, Weigong Shi, Leming Tong, Weida Hong, Huixiao Similarities and differences between variants called with human reference genome HG19 or HG38 |
title | Similarities and differences between variants called with human reference genome HG19 or HG38 |
title_full | Similarities and differences between variants called with human reference genome HG19 or HG38 |
title_fullStr | Similarities and differences between variants called with human reference genome HG19 or HG38 |
title_full_unstemmed | Similarities and differences between variants called with human reference genome HG19 or HG38 |
title_short | Similarities and differences between variants called with human reference genome HG19 or HG38 |
title_sort | similarities and differences between variants called with human reference genome hg19 or hg38 |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419332/ https://www.ncbi.nlm.nih.gov/pubmed/30871461 http://dx.doi.org/10.1186/s12859-019-2620-0 |
work_keys_str_mv | AT panbohu similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT kuskorebecca similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT xiaowenming similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT zhengyuanting similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT liuzhichao similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT xiaochunlin similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT sakkiahsugunadevi similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT guowenjing similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT gongping similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT zhangchaoyang similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT geweigong similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT shileming similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT tongweida similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 AT honghuixiao similaritiesanddifferencesbetweenvariantscalledwithhumanreferencegenomehg19orhg38 |