Cargando…

TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

BACKGROUND: Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Fur...

Descripción completa

Detalles Bibliográficos
Autores principales: Schmieder, Robert, Lim, Yan Wei, Rohwer, Forest, Edwards, Robert
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2910026/
https://www.ncbi.nlm.nih.gov/pubmed/20573248
http://dx.doi.org/10.1186/1471-2105-11-341
_version_ 1782184352709345280
author Schmieder, Robert
Lim, Yan Wei
Rohwer, Forest
Edwards, Robert
author_facet Schmieder, Robert
Lim, Yan Wei
Rohwer, Forest
Edwards, Robert
author_sort Schmieder, Robert
collection PubMed
description BACKGROUND: Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. RESULTS: TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. CONCLUSIONS: TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner.
format Text
id pubmed-2910026
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29100262010-07-27 TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets Schmieder, Robert Lim, Yan Wei Rohwer, Forest Edwards, Robert BMC Bioinformatics Software BACKGROUND: Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. RESULTS: TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. CONCLUSIONS: TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner. BioMed Central 2010-06-23 /pmc/articles/PMC2910026/ /pubmed/20573248 http://dx.doi.org/10.1186/1471-2105-11-341 Text en Copyright ©2010 Schmieder et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Schmieder, Robert
Lim, Yan Wei
Rohwer, Forest
Edwards, Robert
TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets
title TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets
title_full TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets
title_fullStr TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets
title_full_unstemmed TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets
title_short TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets
title_sort tagcleaner: identification and removal of tag sequences from genomic and metagenomic datasets
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2910026/
https://www.ncbi.nlm.nih.gov/pubmed/20573248
http://dx.doi.org/10.1186/1471-2105-11-341
work_keys_str_mv AT schmiederrobert tagcleaneridentificationandremovaloftagsequencesfromgenomicandmetagenomicdatasets
AT limyanwei tagcleaneridentificationandremovaloftagsequencesfromgenomicandmetagenomicdatasets
AT rohwerforest tagcleaneridentificationandremovaloftagsequencesfromgenomicandmetagenomicdatasets
AT edwardsrobert tagcleaneridentificationandremovaloftagsequencesfromgenomicandmetagenomicdatasets