Cargando…

smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers

MOTIVATION: Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Chang, Gu, Xiujing, Padmanabhan, Raghavendra, Wu, Zhong, Peng, Quan, DiCarlo, John, Wang, Yexun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477992/
https://www.ncbi.nlm.nih.gov/pubmed/30192920
http://dx.doi.org/10.1093/bioinformatics/bty790
_version_ 1783413116041691136
author Xu, Chang
Gu, Xiujing
Padmanabhan, Raghavendra
Wu, Zhong
Peng, Quan
DiCarlo, John
Wang, Yexun
author_facet Xu, Chang
Gu, Xiujing
Padmanabhan, Raghavendra
Wu, Zhong
Peng, Quan
DiCarlo, John
Wang, Yexun
author_sort Xu, Chang
collection PubMed
description MOTIVATION: Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling. RESULTS: We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2’s superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data. AVAILABILITY AND IMPLEMENTATION: The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6477992
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-64779922019-04-25 smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers Xu, Chang Gu, Xiujing Padmanabhan, Raghavendra Wu, Zhong Peng, Quan DiCarlo, John Wang, Yexun Bioinformatics Original Papers MOTIVATION: Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling. RESULTS: We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2’s superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data. AVAILABILITY AND IMPLEMENTATION: The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-04-15 2018-09-06 /pmc/articles/PMC6477992/ /pubmed/30192920 http://dx.doi.org/10.1093/bioinformatics/bty790 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Xu, Chang
Gu, Xiujing
Padmanabhan, Raghavendra
Wu, Zhong
Peng, Quan
DiCarlo, John
Wang, Yexun
smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
title smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
title_full smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
title_fullStr smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
title_full_unstemmed smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
title_short smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
title_sort smcounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477992/
https://www.ncbi.nlm.nih.gov/pubmed/30192920
http://dx.doi.org/10.1093/bioinformatics/bty790
work_keys_str_mv AT xuchang smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers
AT guxiujing smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers
AT padmanabhanraghavendra smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers
AT wuzhong smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers
AT pengquan smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers
AT dicarlojohn smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers
AT wangyexun smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers