Cargando…
A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome
Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and ‘NA12878’ (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Journal Experts
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10029055/ https://www.ncbi.nlm.nih.gov/pubmed/36945432 http://dx.doi.org/10.21203/rs.3.rs-2580940/v1 |
_version_ | 1784910067500318720 |
---|---|
author | Park, HyeonSeul Gim, JungSoo |
author_facet | Park, HyeonSeul Gim, JungSoo |
author_sort | Park, HyeonSeul |
collection | PubMed |
description | Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and ‘NA12878’ (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct result; 2) there is an optimal work flow for single genome data. We assessed the performance of variant calling pipelines using hg38 and a Korean genome (reference genomes) and two whole-genome sequencing (WGS) reads from different ethnic origins: Caucasian (NA12878) and Korean. The pipelines used BWA-mem and Novoalign as mapping tools and GATK4, Strelka2, DeepVariant, and Samtools as variant callers. Using hg38 led to better performance (based on precision and recall), regardless of the ethnic origin of the WGS reads. Novoalign + GATK4 demonstrated best performance when using both WGS data. We assessed pipeline efficiency by removing the markduplicate process, and all pipelines, except Novoalign + DeepVariant, maintained their performance. Novoalign identified more variants overall and in MHC of chr6 when combined with GATK4. No evidence suggested improved variant calling performance from single WGS reads with a different ethnic reference, re-validating hg38 utility. We recommend using Novoalign + GATK4 without markduplication for single PCR-free WGS data. |
format | Online Article Text |
id | pubmed-10029055 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Journal Experts |
record_format | MEDLINE/PubMed |
spelling | pubmed-100290552023-03-22 A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome Park, HyeonSeul Gim, JungSoo Res Sq Article Most genome benchmark studies utilize hg38 as a reference genome (based on Caucasian and African samples) and ‘NA12878’ (a Caucasian sequencing read) for comparison. Here, we aimed to elucidate whether 1) ethnic match or mismatch between the reference genome and sequencing reads produces a distinct result; 2) there is an optimal work flow for single genome data. We assessed the performance of variant calling pipelines using hg38 and a Korean genome (reference genomes) and two whole-genome sequencing (WGS) reads from different ethnic origins: Caucasian (NA12878) and Korean. The pipelines used BWA-mem and Novoalign as mapping tools and GATK4, Strelka2, DeepVariant, and Samtools as variant callers. Using hg38 led to better performance (based on precision and recall), regardless of the ethnic origin of the WGS reads. Novoalign + GATK4 demonstrated best performance when using both WGS data. We assessed pipeline efficiency by removing the markduplicate process, and all pipelines, except Novoalign + DeepVariant, maintained their performance. Novoalign identified more variants overall and in MHC of chr6 when combined with GATK4. No evidence suggested improved variant calling performance from single WGS reads with a different ethnic reference, re-validating hg38 utility. We recommend using Novoalign + GATK4 without markduplication for single PCR-free WGS data. American Journal Experts 2023-03-06 /pmc/articles/PMC10029055/ /pubmed/36945432 http://dx.doi.org/10.21203/rs.3.rs-2580940/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. https://creativecommons.org/licenses/by/4.0/License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License (https://creativecommons.org/licenses/by/4.0/) |
spellingShingle | Article Park, HyeonSeul Gim, JungSoo A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome |
title | A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome |
title_full | A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome |
title_fullStr | A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome |
title_full_unstemmed | A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome |
title_short | A comparative investigation of variant calling and genotyping for a single non-Caucasian whole genome |
title_sort | comparative investigation of variant calling and genotyping for a single non-caucasian whole genome |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10029055/ https://www.ncbi.nlm.nih.gov/pubmed/36945432 http://dx.doi.org/10.21203/rs.3.rs-2580940/v1 |
work_keys_str_mv | AT parkhyeonseul acomparativeinvestigationofvariantcallingandgenotypingforasinglenoncaucasianwholegenome AT gimjungsoo acomparativeinvestigationofvariantcallingandgenotypingforasinglenoncaucasianwholegenome AT parkhyeonseul comparativeinvestigationofvariantcallingandgenotypingforasinglenoncaucasianwholegenome AT gimjungsoo comparativeinvestigationofvariantcallingandgenotypingforasinglenoncaucasianwholegenome |