Cargando…

Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data

Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiang, Xudong, Lu, Bowen, Song, Dongyang, Li, Jie, Shu, Kunxian, Pu, Dan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10665316/
https://www.ncbi.nlm.nih.gov/pubmed/37993475
http://dx.doi.org/10.1038/s41598-023-47135-3
_version_ 1785148842724818944
author Xiang, Xudong
Lu, Bowen
Song, Dongyang
Li, Jie
Shu, Kunxian
Pu, Dan
author_facet Xiang, Xudong
Lu, Bowen
Song, Dongyang
Li, Jie
Shu, Kunxian
Pu, Dan
author_sort Xiang, Xudong
collection PubMed
description Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.
format Online
Article
Text
id pubmed-10665316
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106653162023-11-22 Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data Xiang, Xudong Lu, Bowen Song, Dongyang Li, Jie Shu, Kunxian Pu, Dan Sci Rep Article Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications. Nature Publishing Group UK 2023-11-22 /pmc/articles/PMC10665316/ /pubmed/37993475 http://dx.doi.org/10.1038/s41598-023-47135-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Xiang, Xudong
Lu, Bowen
Song, Dongyang
Li, Jie
Shu, Kunxian
Pu, Dan
Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_full Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_fullStr Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_full_unstemmed Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_short Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_sort evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10665316/
https://www.ncbi.nlm.nih.gov/pubmed/37993475
http://dx.doi.org/10.1038/s41598-023-47135-3
work_keys_str_mv AT xiangxudong evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT lubowen evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT songdongyang evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT lijie evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT shukunxian evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata
AT pudan evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata