Cargando…

Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics

Emerging Linked-Read technologies (aka read cloud or barcoded short-reads) have revived interest in short-read technology as a viable approach to understand large-scale structures in genomes and metagenomes. Linked-Read technologies, such as the 10x Chromium system, use a microfluidic system and a s...

Descripción completa

Detalles Bibliográficos
Autores principales: Danko, David C., Meleshko, Dmitry, Bezdan, Daniela, Mason, Christopher, Hajirasouliha, Iman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6314158/
https://www.ncbi.nlm.nih.gov/pubmed/30523036
http://dx.doi.org/10.1101/gr.235499.118
_version_ 1783384076746489856
author Danko, David C.
Meleshko, Dmitry
Bezdan, Daniela
Mason, Christopher
Hajirasouliha, Iman
author_facet Danko, David C.
Meleshko, Dmitry
Bezdan, Daniela
Mason, Christopher
Hajirasouliha, Iman
author_sort Danko, David C.
collection PubMed
description Emerging Linked-Read technologies (aka read cloud or barcoded short-reads) have revived interest in short-read technology as a viable approach to understand large-scale structures in genomes and metagenomes. Linked-Read technologies, such as the 10x Chromium system, use a microfluidic system and a specialized set of 3′ barcodes (aka UIDs) to tag short DNA reads sourced from the same long fragment of DNA; subsequently, the tagged reads are sequenced on standard short-read platforms. This approach results in interesting compromises. Each long fragment of DNA is only sparsely covered by reads, no information about the ordering of reads from the same fragment is preserved, and 3′ barcodes match reads from roughly 2–20 long fragments of DNA. However, compared to long-read technologies, the cost per base to sequence is far lower, far less input DNA is required, and the per base error rate is that of Illumina short-reads. In this paper, we formally describe a particular algorithmic issue common to Linked-Read technology: the deconvolution of reads with a single 3′ barcode into clusters that represent single long fragments of DNA. We introduce Minerva, a graph-based algorithm that approximately solves the barcode deconvolution problem for metagenomic data (where reference genomes may be incomplete or unavailable). Additionally, we develop two demonstrations where the deconvolution of barcoded reads improves downstream results, improving the specificity of taxonomic assignments and of k-mer-based clustering. To the best of our knowledge, we are the first to address the problem of barcode deconvolution in metagenomics.
format Online
Article
Text
id pubmed-6314158
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-63141582019-01-11 Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics Danko, David C. Meleshko, Dmitry Bezdan, Daniela Mason, Christopher Hajirasouliha, Iman Genome Res Method Emerging Linked-Read technologies (aka read cloud or barcoded short-reads) have revived interest in short-read technology as a viable approach to understand large-scale structures in genomes and metagenomes. Linked-Read technologies, such as the 10x Chromium system, use a microfluidic system and a specialized set of 3′ barcodes (aka UIDs) to tag short DNA reads sourced from the same long fragment of DNA; subsequently, the tagged reads are sequenced on standard short-read platforms. This approach results in interesting compromises. Each long fragment of DNA is only sparsely covered by reads, no information about the ordering of reads from the same fragment is preserved, and 3′ barcodes match reads from roughly 2–20 long fragments of DNA. However, compared to long-read technologies, the cost per base to sequence is far lower, far less input DNA is required, and the per base error rate is that of Illumina short-reads. In this paper, we formally describe a particular algorithmic issue common to Linked-Read technology: the deconvolution of reads with a single 3′ barcode into clusters that represent single long fragments of DNA. We introduce Minerva, a graph-based algorithm that approximately solves the barcode deconvolution problem for metagenomic data (where reference genomes may be incomplete or unavailable). Additionally, we develop two demonstrations where the deconvolution of barcoded reads improves downstream results, improving the specificity of taxonomic assignments and of k-mer-based clustering. To the best of our knowledge, we are the first to address the problem of barcode deconvolution in metagenomics. Cold Spring Harbor Laboratory Press 2019-01 /pmc/articles/PMC6314158/ /pubmed/30523036 http://dx.doi.org/10.1101/gr.235499.118 Text en © 2019 Danko et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
spellingShingle Method
Danko, David C.
Meleshko, Dmitry
Bezdan, Daniela
Mason, Christopher
Hajirasouliha, Iman
Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics
title Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics
title_full Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics
title_fullStr Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics
title_full_unstemmed Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics
title_short Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads for metagenomics
title_sort minerva: an alignment- and reference-free approach to deconvolve linked-reads for metagenomics
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6314158/
https://www.ncbi.nlm.nih.gov/pubmed/30523036
http://dx.doi.org/10.1101/gr.235499.118
work_keys_str_mv AT dankodavidc minervaanalignmentandreferencefreeapproachtodeconvolvelinkedreadsformetagenomics
AT meleshkodmitry minervaanalignmentandreferencefreeapproachtodeconvolvelinkedreadsformetagenomics
AT bezdandaniela minervaanalignmentandreferencefreeapproachtodeconvolvelinkedreadsformetagenomics
AT masonchristopher minervaanalignmentandreferencefreeapproachtodeconvolvelinkedreadsformetagenomics
AT hajirasoulihaiman minervaanalignmentandreferencefreeapproachtodeconvolvelinkedreadsformetagenomics