Cargando…

ADEPT, a dynamic next generation sequencing data error-detection program with trimming

BACKGROUND: Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism di...

Descripción completa

Detalles Bibliográficos
Autores principales:	Feng, Shihai, Lo, Chien-Chi, Li, Po-E, Chain, Patrick S. G.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4772517/ https://www.ncbi.nlm.nih.gov/pubmed/26928302 http://dx.doi.org/10.1186/s12859-016-0967-z

_version_	1782418584806359040
author	Feng, Shihai Lo, Chien-Chi Li, Po-E Chain, Patrick S. G.
author_facet	Feng, Shihai Lo, Chien-Chi Li, Po-E Chain, Patrick S. G.
author_sort	Feng, Shihai
collection	PubMed
description	BACKGROUND: Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. RESULTS: In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. CONCLUSIONS: ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0967-z) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4772517
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-47725172016-03-02 ADEPT, a dynamic next generation sequencing data error-detection program with trimming Feng, Shihai Lo, Chien-Chi Li, Po-E Chain, Patrick S. G. BMC Bioinformatics Software BACKGROUND: Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. RESULTS: In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. CONCLUSIONS: ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0967-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-02-29 /pmc/articles/PMC4772517/ /pubmed/26928302 http://dx.doi.org/10.1186/s12859-016-0967-z Text en © Feng et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Feng, Shihai Lo, Chien-Chi Li, Po-E Chain, Patrick S. G. ADEPT, a dynamic next generation sequencing data error-detection program with trimming
title	ADEPT, a dynamic next generation sequencing data error-detection program with trimming
title_full	ADEPT, a dynamic next generation sequencing data error-detection program with trimming
title_fullStr	ADEPT, a dynamic next generation sequencing data error-detection program with trimming
title_full_unstemmed	ADEPT, a dynamic next generation sequencing data error-detection program with trimming
title_short	ADEPT, a dynamic next generation sequencing data error-detection program with trimming
title_sort	adept, a dynamic next generation sequencing data error-detection program with trimming
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4772517/ https://www.ncbi.nlm.nih.gov/pubmed/26928302 http://dx.doi.org/10.1186/s12859-016-0967-z
work_keys_str_mv	AT fengshihai adeptadynamicnextgenerationsequencingdataerrordetectionprogramwithtrimming AT lochienchi adeptadynamicnextgenerationsequencingdataerrordetectionprogramwithtrimming AT lipoe adeptadynamicnextgenerationsequencingdataerrordetectionprogramwithtrimming AT chainpatricksg adeptadynamicnextgenerationsequencingdataerrordetectionprogramwithtrimming

ADEPT, a dynamic next generation sequencing data error-detection program with trimming

Ejemplares similares