Cargando…

Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data

BACKGROUND: Removing duplicates might be considered as a well-resolved problem in next-generation sequencing (NGS) data processing domain. However, as NGS technology gains more recognition in clinical application, researchers start to pay more attention to its sequencing errors, and prefer to remove...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Shifu, Zhou, Yanqing, Chen, Yaru, Huang, Tanxiao, Liao, Wenting, Xu, Yun, Li, Zhicheng, Gu, Jia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933617/ https://www.ncbi.nlm.nih.gov/pubmed/31881822 http://dx.doi.org/10.1186/s12859-019-3280-9

_version_	1783483242183131136
author	Chen, Shifu Zhou, Yanqing Chen, Yaru Huang, Tanxiao Liao, Wenting Xu, Yun Li, Zhicheng Gu, Jia
author_facet	Chen, Shifu Zhou, Yanqing Chen, Yaru Huang, Tanxiao Liao, Wenting Xu, Yun Li, Zhicheng Gu, Jia
author_sort	Chen, Shifu
collection	PubMed
description	BACKGROUND: Removing duplicates might be considered as a well-resolved problem in next-generation sequencing (NGS) data processing domain. However, as NGS technology gains more recognition in clinical application, researchers start to pay more attention to its sequencing errors, and prefer to remove these errors while performing deduplication operations. Recently, a new technology called unique molecular identifier (UMI) has been developed to better identify sequencing reads derived from different DNA fragments. Most existing duplicate removing tools cannot handle the UMI-integrated data. Some modern tools can work with UMIs, but are usually slow and use too much memory. Furthermore, existing tools rarely report rich statistical results, which are very important for quality control and downstream analysis. These unmet requirements drove us to develop an ultra-fast, simple, little-weighted but powerful tool for duplicate removing and sequence error suppressing, with features of handling UMIs and reporting informative results. RESULTS: This paper presents an efficient tool gencore for duplicate removing and sequence error suppressing of NGS data. This tool clusters the mapped sequencing reads and merges reads in each cluster to generate one single consensus read. While the consensus read is generated, the random errors introduced by library construction and sequencing can be removed. This error-suppressing feature makes gencore very suitable for the application of detecting ultra-low frequency mutations from deep sequencing data. When unique molecular identifier (UMI) technology is applied, gencore can use them to identify the reads derived from same original DNA fragment. Gencore reports statistical results in both HTML and JSON formats. The HTML format report contains many interactive figures plotting statistical coverage and duplication information. The JSON format report contains all the statistical results, and is interpretable for downstream programs. CONCLUSIONS: Comparing to the conventional tools like Picard and SAMtools, gencore greatly reduces the output data’s mapping mismatches, which are mostly caused by errors. Comparing to some new tools like UMI-Reducer and UMI-tools, gencore runs much faster, uses less memory, generates better consensus reads and provides simpler interfaces. To our best knowledge, gencore is the only duplicate removing tool that generates both informative HTML and JSON reports. This tool is available at: https://github.com/OpenGene/gencore
format	Online Article Text
id	pubmed-6933617
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-69336172019-12-30 Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data Chen, Shifu Zhou, Yanqing Chen, Yaru Huang, Tanxiao Liao, Wenting Xu, Yun Li, Zhicheng Gu, Jia BMC Bioinformatics Software BACKGROUND: Removing duplicates might be considered as a well-resolved problem in next-generation sequencing (NGS) data processing domain. However, as NGS technology gains more recognition in clinical application, researchers start to pay more attention to its sequencing errors, and prefer to remove these errors while performing deduplication operations. Recently, a new technology called unique molecular identifier (UMI) has been developed to better identify sequencing reads derived from different DNA fragments. Most existing duplicate removing tools cannot handle the UMI-integrated data. Some modern tools can work with UMIs, but are usually slow and use too much memory. Furthermore, existing tools rarely report rich statistical results, which are very important for quality control and downstream analysis. These unmet requirements drove us to develop an ultra-fast, simple, little-weighted but powerful tool for duplicate removing and sequence error suppressing, with features of handling UMIs and reporting informative results. RESULTS: This paper presents an efficient tool gencore for duplicate removing and sequence error suppressing of NGS data. This tool clusters the mapped sequencing reads and merges reads in each cluster to generate one single consensus read. While the consensus read is generated, the random errors introduced by library construction and sequencing can be removed. This error-suppressing feature makes gencore very suitable for the application of detecting ultra-low frequency mutations from deep sequencing data. When unique molecular identifier (UMI) technology is applied, gencore can use them to identify the reads derived from same original DNA fragment. Gencore reports statistical results in both HTML and JSON formats. The HTML format report contains many interactive figures plotting statistical coverage and duplication information. The JSON format report contains all the statistical results, and is interpretable for downstream programs. CONCLUSIONS: Comparing to the conventional tools like Picard and SAMtools, gencore greatly reduces the output data’s mapping mismatches, which are mostly caused by errors. Comparing to some new tools like UMI-Reducer and UMI-tools, gencore runs much faster, uses less memory, generates better consensus reads and provides simpler interfaces. To our best knowledge, gencore is the only duplicate removing tool that generates both informative HTML and JSON reports. This tool is available at: https://github.com/OpenGene/gencore BioMed Central 2019-12-27 /pmc/articles/PMC6933617/ /pubmed/31881822 http://dx.doi.org/10.1186/s12859-019-3280-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Chen, Shifu Zhou, Yanqing Chen, Yaru Huang, Tanxiao Liao, Wenting Xu, Yun Li, Zhicheng Gu, Jia Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data
title	Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data
title_full	Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data
title_fullStr	Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data
title_full_unstemmed	Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data
title_short	Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data
title_sort	gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of ngs data
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6933617/ https://www.ncbi.nlm.nih.gov/pubmed/31881822 http://dx.doi.org/10.1186/s12859-019-3280-9
work_keys_str_mv	AT chenshifu gencoreanefficienttooltogenerateconsensusreadsforerrorsuppressingandduplicateremovingofngsdata AT zhouyanqing gencoreanefficienttooltogenerateconsensusreadsforerrorsuppressingandduplicateremovingofngsdata AT chenyaru gencoreanefficienttooltogenerateconsensusreadsforerrorsuppressingandduplicateremovingofngsdata AT huangtanxiao gencoreanefficienttooltogenerateconsensusreadsforerrorsuppressingandduplicateremovingofngsdata AT liaowenting gencoreanefficienttooltogenerateconsensusreadsforerrorsuppressingandduplicateremovingofngsdata AT xuyun gencoreanefficienttooltogenerateconsensusreadsforerrorsuppressingandduplicateremovingofngsdata AT lizhicheng gencoreanefficienttooltogenerateconsensusreadsforerrorsuppressingandduplicateremovingofngsdata AT gujia gencoreanefficienttooltogenerateconsensusreadsforerrorsuppressingandduplicateremovingofngsdata

Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data

Ejemplares similares