Cargando…

A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome

Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and ‘NA12878’ (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, HyeonSeul, Gim, JungSoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Journal Experts 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10029055/
https://www.ncbi.nlm.nih.gov/pubmed/36945432
http://dx.doi.org/10.21203/rs.3.rs-2580940/v1
_version_ 1784910067500318720
author Park, HyeonSeul
Gim, JungSoo
author_facet Park, HyeonSeul
Gim, JungSoo
author_sort Park, HyeonSeul
collection PubMed
description Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and ‘NA12878’ (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct result; 2) there is an optimal work flow for single genome data. We assessed the performance of variant calling pipelines using hg38 and a Korean genome (reference genomes) and two whole-genome sequencing (WGS) reads from different ethnic origins: Caucasian (NA12878) and Korean. The pipelines used BWA-mem and Novoalign as mapping tools and GATK4, Strelka2, DeepVariant, and Samtools as variant callers. Using hg38 led to better performance (based on precision and recall), regardless of the ethnic origin of the WGS reads. Novoalign + GATK4 demonstrated best performance when using both WGS data. We assessed pipeline efficiency by removing the markduplicate process, and all pipelines, except Novoalign + DeepVariant, maintained their performance. Novoalign identified more variants overall and in MHC of chr6 when combined with GATK4. No evidence suggested improved variant calling performance from single WGS reads with a different ethnic reference, re-validating hg38 utility. We recommend using Novoalign + GATK4 without markduplication for single PCR-free WGS data.
format Online
Article
Text
id pubmed-10029055
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Journal Experts
record_format MEDLINE/PubMed
spelling pubmed-100290552023-03-22 A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome Park, HyeonSeul Gim, JungSoo Res Sq Article Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and ‘NA12878’ (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct result; 2) there is an optimal work flow for single genome data. We assessed the performance of variant calling pipelines using hg38 and a Korean genome (reference genomes) and two whole-genome sequencing (WGS) reads from different ethnic origins: Caucasian (NA12878) and Korean. The pipelines used BWA-mem and Novoalign as mapping tools and GATK4, Strelka2, DeepVariant, and Samtools as variant callers. Using hg38 led to better performance (based on precision and recall), regardless of the ethnic origin of the WGS reads. Novoalign + GATK4 demonstrated best performance when using both WGS data. We assessed pipeline efficiency by removing the markduplicate process, and all pipelines, except Novoalign + DeepVariant, maintained their performance. Novoalign identified more variants overall and in MHC of chr6 when combined with GATK4. No evidence suggested improved variant calling performance from single WGS reads with a different ethnic reference, re-validating hg38 utility. We recommend using Novoalign + GATK4 without markduplication for single PCR-free WGS data. American Journal Experts 2023-03-06 /pmc/articles/PMC10029055/ /pubmed/36945432 http://dx.doi.org/10.21203/rs.3.rs-2580940/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. https://creativecommons.org/licenses/by/4.0/License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License (https://creativecommons.org/licenses/by/4.0/)
spellingShingle Article
Park, HyeonSeul
Gim, JungSoo
A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome
title A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome
title_full A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome
title_fullStr A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome
title_full_unstemmed A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome
title_short A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome
title_sort comparative investigation of variant calling and genotyping for a single non-caucasian whole genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10029055/
https://www.ncbi.nlm.nih.gov/pubmed/36945432
http://dx.doi.org/10.21203/rs.3.rs-2580940/v1
work_keys_str_mv AT parkhyeonseul acomparativeinvestigationofvariantcallingandgenotypingforasinglenoncaucasianwholegenome
AT gimjungsoo acomparativeinvestigationofvariantcallingandgenotypingforasinglenoncaucasianwholegenome
AT parkhyeonseul comparativeinvestigationofvariantcallingandgenotypingforasinglenoncaucasianwholegenome
AT gimjungsoo comparativeinvestigationofvariantcallingandgenotypingforasinglenoncaucasianwholegenome