Cargando…

SeqPurge: highly-sensitive adapter trimming for paired-end NGS data

BACKGROUND: Trimming of adapter sequences from short read data is a common preprocessing step during NGS data analysis. When performing paired-end sequencing, the overlap between forward and reverse read can be used to identify excess adapter sequences. This is exploited by several previously publis...

Descripción completa

Detalles Bibliográficos
Autores principales: Sturm, Marc, Schroeder, Christopher, Bauer, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4862148/
https://www.ncbi.nlm.nih.gov/pubmed/27161244
http://dx.doi.org/10.1186/s12859-016-1069-7
_version_ 1782431315937722368
author Sturm, Marc
Schroeder, Christopher
Bauer, Peter
author_facet Sturm, Marc
Schroeder, Christopher
Bauer, Peter
author_sort Sturm, Marc
collection PubMed
description BACKGROUND: Trimming of adapter sequences from short read data is a common preprocessing step during NGS data analysis. When performing paired-end sequencing, the overlap between forward and reverse read can be used to identify excess adapter sequences. This is exploited by several previously published adapter trimming tools. However, our evaluation on amplicon-based data shows that most of the current tools are not able to remove all adapter sequences and that adapter contamination may even lead to spurious variant calls. RESULTS: Here we present SeqPurge (https://github.com/imgag/ngs-bits), a highly-sensitive adapter trimmer that uses a probabilistic approach to detect the overlap between forward and reverse reads of Illumina sequencing data. SeqPurge can detect very short adapter sequences, even if only one base long. Compared to other adapter trimmers specifically designed for paired-end data, we found that SeqPurge achieves a higher sensitivity. The number of remaining adapter bases after trimming is reduced by up to 90 %, depending on the compared tool. In simulations with different error rates, we found that SeqPurge is also the most error-tolerant adapter trimmer in the comparison. CONCLUSION: SeqPurge achieves a very high sensitivity and a high error-tolerance, combined with a specificity and runtime that are comparable to other state-of-the-art adapter trimmers. The very good adapter trimming performance, complemented with additional features such as quality-based trimming and basic quality control, makes SeqPurge an excellent choice for the pre-processing of paired-end NGS data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1069-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4862148
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48621482016-05-20 SeqPurge: highly-sensitive adapter trimming for paired-end NGS data Sturm, Marc Schroeder, Christopher Bauer, Peter BMC Bioinformatics Software BACKGROUND: Trimming of adapter sequences from short read data is a common preprocessing step during NGS data analysis. When performing paired-end sequencing, the overlap between forward and reverse read can be used to identify excess adapter sequences. This is exploited by several previously published adapter trimming tools. However, our evaluation on amplicon-based data shows that most of the current tools are not able to remove all adapter sequences and that adapter contamination may even lead to spurious variant calls. RESULTS: Here we present SeqPurge (https://github.com/imgag/ngs-bits), a highly-sensitive adapter trimmer that uses a probabilistic approach to detect the overlap between forward and reverse reads of Illumina sequencing data. SeqPurge can detect very short adapter sequences, even if only one base long. Compared to other adapter trimmers specifically designed for paired-end data, we found that SeqPurge achieves a higher sensitivity. The number of remaining adapter bases after trimming is reduced by up to 90 %, depending on the compared tool. In simulations with different error rates, we found that SeqPurge is also the most error-tolerant adapter trimmer in the comparison. CONCLUSION: SeqPurge achieves a very high sensitivity and a high error-tolerance, combined with a specificity and runtime that are comparable to other state-of-the-art adapter trimmers. The very good adapter trimming performance, complemented with additional features such as quality-based trimming and basic quality control, makes SeqPurge an excellent choice for the pre-processing of paired-end NGS data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1069-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-05-10 /pmc/articles/PMC4862148/ /pubmed/27161244 http://dx.doi.org/10.1186/s12859-016-1069-7 Text en © Sturm et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Sturm, Marc
Schroeder, Christopher
Bauer, Peter
SeqPurge: highly-sensitive adapter trimming for paired-end NGS data
title SeqPurge: highly-sensitive adapter trimming for paired-end NGS data
title_full SeqPurge: highly-sensitive adapter trimming for paired-end NGS data
title_fullStr SeqPurge: highly-sensitive adapter trimming for paired-end NGS data
title_full_unstemmed SeqPurge: highly-sensitive adapter trimming for paired-end NGS data
title_short SeqPurge: highly-sensitive adapter trimming for paired-end NGS data
title_sort seqpurge: highly-sensitive adapter trimming for paired-end ngs data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4862148/
https://www.ncbi.nlm.nih.gov/pubmed/27161244
http://dx.doi.org/10.1186/s12859-016-1069-7
work_keys_str_mv AT sturmmarc seqpurgehighlysensitiveadaptertrimmingforpairedendngsdata
AT schroederchristopher seqpurgehighlysensitiveadaptertrimmingforpairedendngsdata
AT bauerpeter seqpurgehighlysensitiveadaptertrimmingforpairedendngsdata