Cargando…
PANDAseq: paired-end assembler for illumina sequences
BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher se...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3471323/ https://www.ncbi.nlm.nih.gov/pubmed/22333067 http://dx.doi.org/10.1186/1471-2105-13-31 |
_version_ | 1782246405088215040 |
---|---|
author | Masella, Andre P Bartram, Andrea K Truszkowski, Jakub M Brown, Daniel G Neufeld, Josh D |
author_facet | Masella, Andre P Bartram, Andrea K Truszkowski, Jakub M Brown, Daniel G Neufeld, Josh D |
author_sort | Masella, Andre P |
collection | PubMed |
description | BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. RESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. CONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence. |
format | Online Article Text |
id | pubmed-3471323 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34713232012-10-18 PANDAseq: paired-end assembler for illumina sequences Masella, Andre P Bartram, Andrea K Truszkowski, Jakub M Brown, Daniel G Neufeld, Josh D BMC Bioinformatics Software BACKGROUND: Illumina paired-end reads are used to analyse microbial communities by targeting amplicons of the 16S rRNA gene. Publicly available tools are needed to assemble overlapping paired-end reads while correcting mismatches and uncalled bases; many errors could be corrected to obtain higher sequence yields using quality information. RESULTS: PANDAseq assembles paired-end reads rapidly and with the correction of most errors. Uncertain error corrections come from reads with many low-quality bases identified by upstream processing. Benchmarks were done using real error masks on simulated data, a pure source template, and a pooled template of genomic DNA from known organisms. PANDAseq assembled reads more rapidly and with reduced error incorporation compared to alternative methods. CONCLUSIONS: PANDAseq rapidly assembles sequences and scales to billions of paired-end reads. Assembly of control libraries showed a 4-50% increase in the number of assembled sequences over naïve assembly with negligible loss of "good" sequence. BioMed Central 2012-02-14 /pmc/articles/PMC3471323/ /pubmed/22333067 http://dx.doi.org/10.1186/1471-2105-13-31 Text en Copyright ©2012 Masella et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Masella, Andre P Bartram, Andrea K Truszkowski, Jakub M Brown, Daniel G Neufeld, Josh D PANDAseq: paired-end assembler for illumina sequences |
title | PANDAseq: paired-end assembler for illumina sequences |
title_full | PANDAseq: paired-end assembler for illumina sequences |
title_fullStr | PANDAseq: paired-end assembler for illumina sequences |
title_full_unstemmed | PANDAseq: paired-end assembler for illumina sequences |
title_short | PANDAseq: paired-end assembler for illumina sequences |
title_sort | pandaseq: paired-end assembler for illumina sequences |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3471323/ https://www.ncbi.nlm.nih.gov/pubmed/22333067 http://dx.doi.org/10.1186/1471-2105-13-31 |
work_keys_str_mv | AT masellaandrep pandaseqpairedendassemblerforilluminasequences AT bartramandreak pandaseqpairedendassemblerforilluminasequences AT truszkowskijakubm pandaseqpairedendassemblerforilluminasequences AT browndanielg pandaseqpairedendassemblerforilluminasequences AT neufeldjoshd pandaseqpairedendassemblerforilluminasequences |