Cargando…

Phylogenetic comparative assembly

BACKGROUND: Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It i...

Descripción completa

Detalles Bibliográficos
Autores principales: Husemann, Peter, Stoye, Jens
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2826331/
https://www.ncbi.nlm.nih.gov/pubmed/20047659
http://dx.doi.org/10.1186/1748-7188-5-3
_version_ 1782177853730717696
author Husemann, Peter
Stoye, Jens
author_facet Husemann, Peter
Stoye, Jens
author_sort Husemann, Peter
collection PubMed
description BACKGROUND: Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence. RESULTS: Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent. Subsequently, this graph can be used to compute a layout graph that shows the most promising contig adjacencies in order to aid biologists in finishing the complete genomic sequence. The layout graph shows unique contig orderings where possible, and the best alternatives where necessary. CONCLUSIONS: Our new algorithm for contig ordering uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. An evaluation of our implementation shows that it performs better than recent approaches while being much faster at the same time.
format Text
id pubmed-2826331
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28263312010-02-23 Phylogenetic comparative assembly Husemann, Peter Stoye, Jens Algorithms Mol Biol Research BACKGROUND: Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence. RESULTS: Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a graph that contains the likelihood for each pair of contigs to be adjacent. Subsequently, this graph can be used to compute a layout graph that shows the most promising contig adjacencies in order to aid biologists in finishing the complete genomic sequence. The layout graph shows unique contig orderings where possible, and the best alternatives where necessary. CONCLUSIONS: Our new algorithm for contig ordering uses sequence similarity as well as phylogenetic information to estimate adjacencies of contigs. An evaluation of our implementation shows that it performs better than recent approaches while being much faster at the same time. BioMed Central 2010-01-04 /pmc/articles/PMC2826331/ /pubmed/20047659 http://dx.doi.org/10.1186/1748-7188-5-3 Text en Copyright ©2010 Husemann and Stoye; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Husemann, Peter
Stoye, Jens
Phylogenetic comparative assembly
title Phylogenetic comparative assembly
title_full Phylogenetic comparative assembly
title_fullStr Phylogenetic comparative assembly
title_full_unstemmed Phylogenetic comparative assembly
title_short Phylogenetic comparative assembly
title_sort phylogenetic comparative assembly
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2826331/
https://www.ncbi.nlm.nih.gov/pubmed/20047659
http://dx.doi.org/10.1186/1748-7188-5-3
work_keys_str_mv AT husemannpeter phylogeneticcomparativeassembly
AT stoyejens phylogeneticcomparativeassembly