Cargando…
smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers
MOTIVATION: Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477992/ https://www.ncbi.nlm.nih.gov/pubmed/30192920 http://dx.doi.org/10.1093/bioinformatics/bty790 |
_version_ | 1783413116041691136 |
---|---|
author | Xu, Chang Gu, Xiujing Padmanabhan, Raghavendra Wu, Zhong Peng, Quan DiCarlo, John Wang, Yexun |
author_facet | Xu, Chang Gu, Xiujing Padmanabhan, Raghavendra Wu, Zhong Peng, Quan DiCarlo, John Wang, Yexun |
author_sort | Xu, Chang |
collection | PubMed |
description | MOTIVATION: Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling. RESULTS: We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2’s superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data. AVAILABILITY AND IMPLEMENTATION: The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6477992 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-64779922019-04-25 smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers Xu, Chang Gu, Xiujing Padmanabhan, Raghavendra Wu, Zhong Peng, Quan DiCarlo, John Wang, Yexun Bioinformatics Original Papers MOTIVATION: Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling. RESULTS: We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2’s superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data. AVAILABILITY AND IMPLEMENTATION: The entire pipeline is available at https://github.com/qiaseq/qiaseq-dna under MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-04-15 2018-09-06 /pmc/articles/PMC6477992/ /pubmed/30192920 http://dx.doi.org/10.1093/bioinformatics/bty790 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Xu, Chang Gu, Xiujing Padmanabhan, Raghavendra Wu, Zhong Peng, Quan DiCarlo, John Wang, Yexun smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers |
title | smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers |
title_full | smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers |
title_fullStr | smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers |
title_full_unstemmed | smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers |
title_short | smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers |
title_sort | smcounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6477992/ https://www.ncbi.nlm.nih.gov/pubmed/30192920 http://dx.doi.org/10.1093/bioinformatics/bty790 |
work_keys_str_mv | AT xuchang smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers AT guxiujing smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers AT padmanabhanraghavendra smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers AT wuzhong smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers AT pengquan smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers AT dicarlojohn smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers AT wangyexun smcounter2anaccuratelowfrequencyvariantcallerfortargetedsequencingdatawithuniquemolecularidentifiers |