Cargando…

Finishing bacterial genome assemblies with Mix

MOTIVATION: Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to h...

Descripción completa

Detalles Bibliográficos
Autores principales: Soueidan, Hayssam, Maurier, Florence, Groppi, Alexis, Sirand-Pugnet, Pascal, Tardy, Florence, Citti, Christine, Dupuy, Virginie, Nikolski, Macha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851838/
https://www.ncbi.nlm.nih.gov/pubmed/24564706
http://dx.doi.org/10.1186/1471-2105-14-S15-S16
_version_ 1782294364177825792
author Soueidan, Hayssam
Maurier, Florence
Groppi, Alexis
Sirand-Pugnet, Pascal
Tardy, Florence
Citti, Christine
Dupuy, Virginie
Nikolski, Macha
author_facet Soueidan, Hayssam
Maurier, Florence
Groppi, Alexis
Sirand-Pugnet, Pascal
Tardy, Florence
Citti, Christine
Dupuy, Virginie
Nikolski, Macha
author_sort Soueidan, Hayssam
collection PubMed
description MOTIVATION: Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase. METHODS: In this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length. RESULTS: We evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects. AVAILABILITY: Mix is implemented in Python and is available at https://github.com/cbib/MIX, novel data for our Mycoplasma study is available at http://services.cbib.u-bordeaux2.fr/mix/.
format Online
Article
Text
id pubmed-3851838
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38518382013-12-20 Finishing bacterial genome assemblies with Mix Soueidan, Hayssam Maurier, Florence Groppi, Alexis Sirand-Pugnet, Pascal Tardy, Florence Citti, Christine Dupuy, Virginie Nikolski, Macha BMC Bioinformatics Proceedings MOTIVATION: Among challenges that hamper reaping the benefits of genome assembly are both unfinished assemblies and the ensuing experimental costs. First, numerous software solutions for genome de novo assembly are available, each having its advantages and drawbacks, without clear guidelines as to how to choose among them. Second, these solutions produce draft assemblies that often require a resource intensive finishing phase. METHODS: In this paper we address these two aspects by developing Mix , a tool that mixes two or more draft assemblies, without relying on a reference genome and having the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a set of paths in the extension graph that maximizes the cumulative contig length. RESULTS: We evaluate the performance of Mix on bacterial NGS data from the GAGE-B study and apply it to newly sequenced Mycoplasma genomes. Resulting final assemblies demonstrate a significant improvement in the overall assembly quality. In particular, Mix is consistent by providing better overall quality results even when the choice is guided solely by standard assembly statistics, as is the case for de novo projects. AVAILABILITY: Mix is implemented in Python and is available at https://github.com/cbib/MIX, novel data for our Mycoplasma study is available at http://services.cbib.u-bordeaux2.fr/mix/. BioMed Central 2013-10-15 /pmc/articles/PMC3851838/ /pubmed/24564706 http://dx.doi.org/10.1186/1471-2105-14-S15-S16 Text en Copyright © 2013 Soueidan et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Soueidan, Hayssam
Maurier, Florence
Groppi, Alexis
Sirand-Pugnet, Pascal
Tardy, Florence
Citti, Christine
Dupuy, Virginie
Nikolski, Macha
Finishing bacterial genome assemblies with Mix
title Finishing bacterial genome assemblies with Mix
title_full Finishing bacterial genome assemblies with Mix
title_fullStr Finishing bacterial genome assemblies with Mix
title_full_unstemmed Finishing bacterial genome assemblies with Mix
title_short Finishing bacterial genome assemblies with Mix
title_sort finishing bacterial genome assemblies with mix
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851838/
https://www.ncbi.nlm.nih.gov/pubmed/24564706
http://dx.doi.org/10.1186/1471-2105-14-S15-S16
work_keys_str_mv AT soueidanhayssam finishingbacterialgenomeassemblieswithmix
AT maurierflorence finishingbacterialgenomeassemblieswithmix
AT groppialexis finishingbacterialgenomeassemblieswithmix
AT sirandpugnetpascal finishingbacterialgenomeassemblieswithmix
AT tardyflorence finishingbacterialgenomeassemblieswithmix
AT cittichristine finishingbacterialgenomeassemblieswithmix
AT dupuyvirginie finishingbacterialgenomeassemblieswithmix
AT nikolskimacha finishingbacterialgenomeassemblieswithmix