Cargando…

Filtering duplicate reads from 454 pyrosequencing data

Motivation: Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artifici...

Descripción completa

Detalles Bibliográficos
Autores principales: Balzer, Susanne, Malde, Ketil, Grohme, Markus A., Jonassen, Inge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605598/
https://www.ncbi.nlm.nih.gov/pubmed/23376350
http://dx.doi.org/10.1093/bioinformatics/btt047
_version_ 1782263920943169536
author Balzer, Susanne
Malde, Ketil
Grohme, Markus A.
Jonassen, Inge
author_facet Balzer, Susanne
Malde, Ketil
Grohme, Markus A.
Jonassen, Inge
author_sort Balzer, Susanne
collection PubMed
description Motivation: Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the estimation of diversity and composition of a sample. To date, there are several tools that aim to remove both sequencing noise and duplicates. Nevertheless, duplicate removal is often based on nucleotide sequences rather than on the underlying flow values, which contain additional information. Results: With the novel tool JATAC, we present an approach towards a more accurate duplicate removal by analysing flow values directly. Making use of previous findings on 454 flow data characteristics, we combine read clustering with Bayesian distance measures. Finally, we provide a benchmark with an existing algorithm. Availability: JATAC is freely available under the General Public License from http://malde.org/ketil/jatac/. Contact: Ketil.Malde@imr.no Supplementary information: Supplementary data are available at Bioinformatics online
format Online
Article
Text
id pubmed-3605598
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36055982013-03-22 Filtering duplicate reads from 454 pyrosequencing data Balzer, Susanne Malde, Ketil Grohme, Markus A. Jonassen, Inge Bioinformatics Original Papers Motivation: Throughout the recent years, 454 pyrosequencing has emerged as an efficient alternative to traditional Sanger sequencing and is widely used in both de novo whole-genome sequencing and metagenomics. Especially the latter application is extremely sensitive to sequencing errors and artificially duplicated reads. Both are common in 454 pyrosequencing and can create a strong bias in the estimation of diversity and composition of a sample. To date, there are several tools that aim to remove both sequencing noise and duplicates. Nevertheless, duplicate removal is often based on nucleotide sequences rather than on the underlying flow values, which contain additional information. Results: With the novel tool JATAC, we present an approach towards a more accurate duplicate removal by analysing flow values directly. Making use of previous findings on 454 flow data characteristics, we combine read clustering with Bayesian distance measures. Finally, we provide a benchmark with an existing algorithm. Availability: JATAC is freely available under the General Public License from http://malde.org/ketil/jatac/. Contact: Ketil.Malde@imr.no Supplementary information: Supplementary data are available at Bioinformatics online Oxford University Press 2013-04-01 2013-02-01 /pmc/articles/PMC3605598/ /pubmed/23376350 http://dx.doi.org/10.1093/bioinformatics/btt047 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Balzer, Susanne
Malde, Ketil
Grohme, Markus A.
Jonassen, Inge
Filtering duplicate reads from 454 pyrosequencing data
title Filtering duplicate reads from 454 pyrosequencing data
title_full Filtering duplicate reads from 454 pyrosequencing data
title_fullStr Filtering duplicate reads from 454 pyrosequencing data
title_full_unstemmed Filtering duplicate reads from 454 pyrosequencing data
title_short Filtering duplicate reads from 454 pyrosequencing data
title_sort filtering duplicate reads from 454 pyrosequencing data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605598/
https://www.ncbi.nlm.nih.gov/pubmed/23376350
http://dx.doi.org/10.1093/bioinformatics/btt047
work_keys_str_mv AT balzersusanne filteringduplicatereadsfrom454pyrosequencingdata
AT maldeketil filteringduplicatereadsfrom454pyrosequencingdata
AT grohmemarkusa filteringduplicatereadsfrom454pyrosequencingdata
AT jonasseninge filteringduplicatereadsfrom454pyrosequencingdata