Cargando…

PANDAseq: paired-end assembler for illumina sequences

BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher se...

Descripción completa

Detalles Bibliográficos
Autores principales: Masella, Andre P, Bartram, Andrea K, Truszkowski, Jakub M, Brown, Daniel G, Neufeld, Josh D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3471323/
https://www.ncbi.nlm.nih.gov/pubmed/22333067
http://dx.doi.org/10.1186/1471-2105-13-31
_version_ 1782246405088215040
author Masella, Andre P
Bartram, Andrea K
Truszkowski, Jakub M
Brown, Daniel G
Neufeld, Josh D
author_facet Masella, Andre P
Bartram, Andrea K
Truszkowski, Jakub M
Brown, Daniel G
Neufeld, Josh D
author_sort Masella, Andre P
collection PubMed
description BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. RESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. CONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence.
format Online
Article
Text
id pubmed-3471323
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34713232012-10-18 PANDAseq: paired-end assembler for illumina sequences Masella, Andre P Bartram, Andrea K Truszkowski, Jakub M Brown, Daniel G Neufeld, Josh D BMC Bioinformatics Software BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. RESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. CONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence. BioMed Central 2012-02-14 /pmc/articles/PMC3471323/ /pubmed/22333067 http://dx.doi.org/10.1186/1471-2105-13-31 Text en Copyright ©2012 Masella et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Masella, Andre P
Bartram, Andrea K
Truszkowski, Jakub M
Brown, Daniel G
Neufeld, Josh D
PANDAseq: paired-end assembler for illumina sequences
title PANDAseq: paired-end assembler for illumina sequences
title_full PANDAseq: paired-end assembler for illumina sequences
title_fullStr PANDAseq: paired-end assembler for illumina sequences
title_full_unstemmed PANDAseq: paired-end assembler for illumina sequences
title_short PANDAseq: paired-end assembler for illumina sequences
title_sort pandaseq: paired-end assembler for illumina sequences
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3471323/
https://www.ncbi.nlm.nih.gov/pubmed/22333067
http://dx.doi.org/10.1186/1471-2105-13-31
work_keys_str_mv AT masellaandrep pandaseqpairedendassemblerforilluminasequences
AT bartramandreak pandaseqpairedendassemblerforilluminasequences
AT truszkowskijakubm pandaseqpairedendassemblerforilluminasequences
AT browndanielg pandaseqpairedendassemblerforilluminasequences
AT neufeldjoshd pandaseqpairedendassemblerforilluminasequences