Cargando…

SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

BACKGROUND: High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessita...

Descripción completa

Detalles Bibliográficos
Autores principales: Falgueras, Juan, Lara, Antonio J, Fernández-Pozo, Noé, Cantón, Francisco R, Pérez-Trabado, Guillermo, Claros, M Gonzalo
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832897/
https://www.ncbi.nlm.nih.gov/pubmed/20089148
http://dx.doi.org/10.1186/1471-2105-11-38
_version_ 1782178353444290560
author Falgueras, Juan
Lara, Antonio J
Fernández-Pozo, Noé
Cantón, Francisco R
Pérez-Trabado, Guillermo
Claros, M Gonzalo
author_facet Falgueras, Juan
Lara, Antonio J
Fernández-Pozo, Noé
Cantón, Francisco R
Pérez-Trabado, Guillermo
Claros, M Gonzalo
author_sort Falgueras, Juan
collection PubMed
description BACKGROUND: High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. RESULTS: SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. CONCLUSIONS: SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.
format Text
id pubmed-2832897
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28328972010-03-06 SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read Falgueras, Juan Lara, Antonio J Fernández-Pozo, Noé Cantón, Francisco R Pérez-Trabado, Guillermo Claros, M Gonzalo BMC Bioinformatics Software BACKGROUND: High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. RESULTS: SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. CONCLUSIONS: SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. BioMed Central 2010-01-20 /pmc/articles/PMC2832897/ /pubmed/20089148 http://dx.doi.org/10.1186/1471-2105-11-38 Text en Copyright ©2010 Falgueras et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Falgueras, Juan
Lara, Antonio J
Fernández-Pozo, Noé
Cantón, Francisco R
Pérez-Trabado, Guillermo
Claros, M Gonzalo
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
title SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
title_full SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
title_fullStr SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
title_full_unstemmed SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
title_short SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
title_sort seqtrim: a high-throughput pipeline for pre-processing any type of sequence read
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2832897/
https://www.ncbi.nlm.nih.gov/pubmed/20089148
http://dx.doi.org/10.1186/1471-2105-11-38
work_keys_str_mv AT falguerasjuan seqtrimahighthroughputpipelineforpreprocessinganytypeofsequenceread
AT laraantonioj seqtrimahighthroughputpipelineforpreprocessinganytypeofsequenceread
AT fernandezpozonoe seqtrimahighthroughputpipelineforpreprocessinganytypeofsequenceread
AT cantonfranciscor seqtrimahighthroughputpipelineforpreprocessinganytypeofsequenceread
AT pereztrabadoguillermo seqtrimahighthroughputpipelineforpreprocessinganytypeofsequenceread
AT clarosmgonzalo seqtrimahighthroughputpipelineforpreprocessinganytypeofsequenceread