Cargando…

Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity

Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently...

Descripción completa

Detalles Bibliográficos
Autores principales: Brown, C. Titus, Moritz, Dominik, O’Brien, Michael P., Reidl, Felix, Reiter, Taylor, Sullivan, Blair D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336657/
https://www.ncbi.nlm.nih.gov/pubmed/32631445
http://dx.doi.org/10.1186/s13059-020-02066-4
Descripción
Sumario:Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/spacegraphcatsunder the 3-Clause BSD License.