Cargando…
An extended genovo metagenomic assembler by incorporating paired-end information
Metagenomes present assembly challenges, when assembling multiple genomes from mixed reads of multiple species. An assembler for single genomes can’t adapt well when applied in this case. A metagenomic assembler, Genovo, is a de novo assembler for metagenomes under a generative probabilistic model....
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3817583/ https://www.ncbi.nlm.nih.gov/pubmed/24281688 http://dx.doi.org/10.7717/peerj.196 |
_version_ | 1782478094303494144 |
---|---|
author | , Afiahayati Sato, Kengo Sakakibara, Yasubumi |
author_facet | , Afiahayati Sato, Kengo Sakakibara, Yasubumi |
author_sort | , Afiahayati |
collection | PubMed |
description | Metagenomes present assembly challenges, when assembling multiple genomes from mixed reads of multiple species. An assembler for single genomes can’t adapt well when applied in this case. A metagenomic assembler, Genovo, is a de novo assembler for metagenomes under a generative probabilistic model. Genovo assembles all reads without discarding any reads in a preprocessing step, and is therefore able to extract more information from metagenomic data and, in principle, generate better assembly results. Paired end sequencing is currently widely-used yet Genovo was designed for 454 single end reads. In this research, we attempted to extend Genovo by incorporating paired-end information, named Xgenovo, so that it generates higher quality assemblies with paired end reads. First, we extended Genovo by adding a bonus parameter in the Chinese Restaurant Process used to get prior accounts for the unknown number of genomes in the sample. This bonus parameter intends for a pair of reads to be in the same contig and as an effort to solve chimera contig case. Second, we modified the sampling process of the location of a read in a contig. We used relative distance for the number of trials in the symmetric geometric distribution instead of using distance between the offset and the center of contig used in Genovo. Using this relative distance, a read sampled in the appropriate location has higher probability. Therefore a read will be mapped in the correct location. Results of extensive experiments on simulated metagenomic datasets from simple to complex with species coverage setting following uniform and lognormal distribution showed that Xgenovo can be superior to the original Genovo and the recently proposed metagenome assembler for 454 reads, MAP. Xgenovo successfully generated longer N50 than Genovo and MAP while maintaining the assembly quality even for very complex metagenomic datasets consisting of 115 species. Xgenovo also demonstrated the potential to decrease the computational cost. This means that our strategy worked well. The software and all simulated datasets are publicly available online at http://xgenovo.dna.bio.keio.ac.jp. |
format | Online Article Text |
id | pubmed-3817583 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-38175832013-11-26 An extended genovo metagenomic assembler by incorporating paired-end information , Afiahayati Sato, Kengo Sakakibara, Yasubumi PeerJ Bioinformatics Metagenomes present assembly challenges, when assembling multiple genomes from mixed reads of multiple species. An assembler for single genomes can’t adapt well when applied in this case. A metagenomic assembler, Genovo, is a de novo assembler for metagenomes under a generative probabilistic model. Genovo assembles all reads without discarding any reads in a preprocessing step, and is therefore able to extract more information from metagenomic data and, in principle, generate better assembly results. Paired end sequencing is currently widely-used yet Genovo was designed for 454 single end reads. In this research, we attempted to extend Genovo by incorporating paired-end information, named Xgenovo, so that it generates higher quality assemblies with paired end reads. First, we extended Genovo by adding a bonus parameter in the Chinese Restaurant Process used to get prior accounts for the unknown number of genomes in the sample. This bonus parameter intends for a pair of reads to be in the same contig and as an effort to solve chimera contig case. Second, we modified the sampling process of the location of a read in a contig. We used relative distance for the number of trials in the symmetric geometric distribution instead of using distance between the offset and the center of contig used in Genovo. Using this relative distance, a read sampled in the appropriate location has higher probability. Therefore a read will be mapped in the correct location. Results of extensive experiments on simulated metagenomic datasets from simple to complex with species coverage setting following uniform and lognormal distribution showed that Xgenovo can be superior to the original Genovo and the recently proposed metagenome assembler for 454 reads, MAP. Xgenovo successfully generated longer N50 than Genovo and MAP while maintaining the assembly quality even for very complex metagenomic datasets consisting of 115 species. Xgenovo also demonstrated the potential to decrease the computational cost. This means that our strategy worked well. The software and all simulated datasets are publicly available online at http://xgenovo.dna.bio.keio.ac.jp. PeerJ Inc. 2013-10-31 /pmc/articles/PMC3817583/ /pubmed/24281688 http://dx.doi.org/10.7717/peerj.196 Text en © 2013 Afiahayati et al. http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Bioinformatics , Afiahayati Sato, Kengo Sakakibara, Yasubumi An extended genovo metagenomic assembler by incorporating paired-end information |
title | An extended genovo metagenomic assembler by incorporating paired-end information |
title_full | An extended genovo metagenomic assembler by incorporating paired-end information |
title_fullStr | An extended genovo metagenomic assembler by incorporating paired-end information |
title_full_unstemmed | An extended genovo metagenomic assembler by incorporating paired-end information |
title_short | An extended genovo metagenomic assembler by incorporating paired-end information |
title_sort | extended genovo metagenomic assembler by incorporating paired-end information |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3817583/ https://www.ncbi.nlm.nih.gov/pubmed/24281688 http://dx.doi.org/10.7717/peerj.196 |
work_keys_str_mv | AT afiahayati anextendedgenovometagenomicassemblerbyincorporatingpairedendinformation AT satokengo anextendedgenovometagenomicassemblerbyincorporatingpairedendinformation AT sakakibarayasubumi anextendedgenovometagenomicassemblerbyincorporatingpairedendinformation AT afiahayati extendedgenovometagenomicassemblerbyincorporatingpairedendinformation AT satokengo extendedgenovometagenomicassemblerbyincorporatingpairedendinformation AT sakakibarayasubumi extendedgenovometagenomicassemblerbyincorporatingpairedendinformation |