Cargando…

A safe and complete algorithm for metagenomic assembly

BACKGROUND: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally for...

Descripción completa

Detalles Bibliográficos
Autores principales: Obscura Acosta, Nidia, Mäkinen, Veli, Tomescu, Alexandru I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5802251/
https://www.ncbi.nlm.nih.gov/pubmed/29445416
http://dx.doi.org/10.1186/s13015-018-0122-7
_version_ 1783298507244830720
author Obscura Acosta, Nidia
Mäkinen, Veli
Tomescu, Alexandru I.
author_facet Obscura Acosta, Nidia
Mäkinen, Veli
Tomescu, Alexandru I.
author_sort Obscura Acosta, Nidia
collection PubMed
description BACKGROUND: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G. APPROACH: We address this problem with the “safe and complete” framework of Tomescu and Medvedev (Research in computational Molecular biology—20th annual conference, RECOMB 9649:152–163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G. RESULTS: We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time [Formula: see text] , and in the edge-covering case it runs in time [Formula: see text] ; n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation.
format Online
Article
Text
id pubmed-5802251
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58022512018-02-14 A safe and complete algorithm for metagenomic assembly Obscura Acosta, Nidia Mäkinen, Veli Tomescu, Alexandru I. Algorithms Mol Biol Research BACKGROUND: Reconstructing the genome of a species from short fragments is one of the oldest bioinformatics problems. Metagenomic assembly is a variant of the problem asking to reconstruct the circular genomes of all bacterial species present in a sequencing sample. This problem can be naturally formulated as finding a collection of circular walks of a directed graph G that together cover all nodes, or edges, of G. APPROACH: We address this problem with the “safe and complete” framework of Tomescu and Medvedev (Research in computational Molecular biology—20th annual conference, RECOMB 9649:152–163, 2016). An algorithm is called safe if it returns only those walks (also called safe) that appear as subwalk in all metagenomic assembly solutions for G. A safe algorithm is called complete if it returns all safe walks of G. RESULTS: We give graph-theoretic characterizations of the safe walks of G, and a safe and complete algorithm finding all safe walks of G. In the node-covering case, our algorithm runs in time [Formula: see text] , and in the edge-covering case it runs in time [Formula: see text] ; n and m denote the number of nodes and edges, respectively, of G. This algorithm constitutes the first theoretical tight upper bound on what can be safely assembled from metagenomic reads using this problem formulation. BioMed Central 2018-02-07 /pmc/articles/PMC5802251/ /pubmed/29445416 http://dx.doi.org/10.1186/s13015-018-0122-7 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Obscura Acosta, Nidia
Mäkinen, Veli
Tomescu, Alexandru I.
A safe and complete algorithm for metagenomic assembly
title A safe and complete algorithm for metagenomic assembly
title_full A safe and complete algorithm for metagenomic assembly
title_fullStr A safe and complete algorithm for metagenomic assembly
title_full_unstemmed A safe and complete algorithm for metagenomic assembly
title_short A safe and complete algorithm for metagenomic assembly
title_sort safe and complete algorithm for metagenomic assembly
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5802251/
https://www.ncbi.nlm.nih.gov/pubmed/29445416
http://dx.doi.org/10.1186/s13015-018-0122-7
work_keys_str_mv AT obscuraacostanidia asafeandcompletealgorithmformetagenomicassembly
AT makinenveli asafeandcompletealgorithmformetagenomicassembly
AT tomescualexandrui asafeandcompletealgorithmformetagenomicassembly
AT obscuraacostanidia safeandcompletealgorithmformetagenomicassembly
AT makinenveli safeandcompletealgorithmformetagenomicassembly
AT tomescualexandrui safeandcompletealgorithmformetagenomicassembly