Cargando…

The Nubeam reference-free approach to analyze metagenomic sequencing reads

We present Nubeam (nucleotide be a matrix) as a novel reference-free approach to analyze short sequencing reads. Nubeam represents nucleotides by matrices, transforms a read into a product of matrices, and assigns numbers to reads based on the product matrix. Nubeam capitalizes on the noncommutative...

Descripción completa

Detalles Bibliográficos
Autores principales: Dai, Hang, Guan, Yongtao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545149/
https://www.ncbi.nlm.nih.gov/pubmed/32883749
http://dx.doi.org/10.1101/gr.261750.120
_version_ 1783591974617481216
author Dai, Hang
Guan, Yongtao
author_facet Dai, Hang
Guan, Yongtao
author_sort Dai, Hang
collection PubMed
description We present Nubeam (nucleotide be a matrix) as a novel reference-free approach to analyze short sequencing reads. Nubeam represents nucleotides by matrices, transforms a read into a product of matrices, and assigns numbers to reads based on the product matrix. Nubeam capitalizes on the noncommutative property of matrix multiplication, such that different reads are assigned different numbers and similar reads similar numbers. A sample, which is a collection of reads, becomes a collection of numbers that form an empirical distribution. We demonstrate that the genetic difference between samples can be quantified by the distance between empirical distributions. Nubeam includes the k-mer method as a special case, but unlike the k-mer method, it is convenient for Nubeam to account for GC bias and nucleotide quality. As a reference-free approach, Nubeam avoids reference bias and mapping bias, and can work with organisms without reference genomes. Thus, Nubeam is ideal to analyze data sets from metagenomics whole genome shotgun (WGS) sequencing, where the amount of unmapped reads is substantial. When applied to a WGS sequencing data set to quantify distances between metagenomics samples from various human body habitats, Nubeam recapitulates findings made by mapping-based methods and sheds light on contributions of unmapped reads. Nubeam is also useful in analyzing 16S rRNA sequencing data, which is a more prevalent type of data set in metagenomics studies. In our analysis, Nubeam recapitulated the findings that natural microbiota in mouse gut are resilient under challenges, and Nubeam detected differences in vaginal microbiota between cases of polycystic ovary syndrome and healthy controls.
format Online
Article
Text
id pubmed-7545149
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-75451492020-10-19 The Nubeam reference-free approach to analyze metagenomic sequencing reads Dai, Hang Guan, Yongtao Genome Res Method We present Nubeam (nucleotide be a matrix) as a novel reference-free approach to analyze short sequencing reads. Nubeam represents nucleotides by matrices, transforms a read into a product of matrices, and assigns numbers to reads based on the product matrix. Nubeam capitalizes on the noncommutative property of matrix multiplication, such that different reads are assigned different numbers and similar reads similar numbers. A sample, which is a collection of reads, becomes a collection of numbers that form an empirical distribution. We demonstrate that the genetic difference between samples can be quantified by the distance between empirical distributions. Nubeam includes the k-mer method as a special case, but unlike the k-mer method, it is convenient for Nubeam to account for GC bias and nucleotide quality. As a reference-free approach, Nubeam avoids reference bias and mapping bias, and can work with organisms without reference genomes. Thus, Nubeam is ideal to analyze data sets from metagenomics whole genome shotgun (WGS) sequencing, where the amount of unmapped reads is substantial. When applied to a WGS sequencing data set to quantify distances between metagenomics samples from various human body habitats, Nubeam recapitulates findings made by mapping-based methods and sheds light on contributions of unmapped reads. Nubeam is also useful in analyzing 16S rRNA sequencing data, which is a more prevalent type of data set in metagenomics studies. In our analysis, Nubeam recapitulated the findings that natural microbiota in mouse gut are resilient under challenges, and Nubeam detected differences in vaginal microbiota between cases of polycystic ovary syndrome and healthy controls. Cold Spring Harbor Laboratory Press 2020-09 /pmc/articles/PMC7545149/ /pubmed/32883749 http://dx.doi.org/10.1101/gr.261750.120 Text en © 2020 Dai and Guan; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Dai, Hang
Guan, Yongtao
The Nubeam reference-free approach to analyze metagenomic sequencing reads
title The Nubeam reference-free approach to analyze metagenomic sequencing reads
title_full The Nubeam reference-free approach to analyze metagenomic sequencing reads
title_fullStr The Nubeam reference-free approach to analyze metagenomic sequencing reads
title_full_unstemmed The Nubeam reference-free approach to analyze metagenomic sequencing reads
title_short The Nubeam reference-free approach to analyze metagenomic sequencing reads
title_sort nubeam reference-free approach to analyze metagenomic sequencing reads
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545149/
https://www.ncbi.nlm.nih.gov/pubmed/32883749
http://dx.doi.org/10.1101/gr.261750.120
work_keys_str_mv AT daihang thenubeamreferencefreeapproachtoanalyzemetagenomicsequencingreads
AT guanyongtao thenubeamreferencefreeapproachtoanalyzemetagenomicsequencingreads
AT daihang nubeamreferencefreeapproachtoanalyzemetagenomicsequencingreads
AT guanyongtao nubeamreferencefreeapproachtoanalyzemetagenomicsequencingreads