Cargando…

Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data

Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xiang, Xudong, Lu, Bowen, Song, Dongyang, Li, Jie, Shu, Kunxian, Pu, Dan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10665316/ https://www.ncbi.nlm.nih.gov/pubmed/37993475 http://dx.doi.org/10.1038/s41598-023-47135-3

_version_	1785148842724818944
author	Xiang, Xudong Lu, Bowen Song, Dongyang Li, Jie Shu, Kunxian Pu, Dan
author_facet	Xiang, Xudong Lu, Bowen Song, Dongyang Li, Jie Shu, Kunxian Pu, Dan
author_sort	Xiang, Xudong
collection	PubMed
description	Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.
format	Online Article Text
id	pubmed-10665316
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-106653162023-11-22 Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data Xiang, Xudong Lu, Bowen Song, Dongyang Li, Jie Shu, Kunxian Pu, Dan Sci Rep Article Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications. Nature Publishing Group UK 2023-11-22 /pmc/articles/PMC10665316/ /pubmed/37993475 http://dx.doi.org/10.1038/s41598-023-47135-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Xiang, Xudong Lu, Bowen Song, Dongyang Li, Jie Shu, Kunxian Pu, Dan Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_full	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_fullStr	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_full_unstemmed	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_short	Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
title_sort	evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10665316/ https://www.ncbi.nlm.nih.gov/pubmed/37993475 http://dx.doi.org/10.1038/s41598-023-47135-3
work_keys_str_mv	AT xiangxudong evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT lubowen evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT songdongyang evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT lijie evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT shukunxian evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata AT pudan evaluatingtheperformanceoflowfrequencyvariantcallingtoolsforthedetectionofvariantsfromshortreaddeepsequencingdata

Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data

Ejemplares similares