Cargando…

Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads

Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequ...

Descripción completa

Detalles Bibliográficos
Autores principales: Choudhari, Sulbha, Grigoriev, Andrey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374381/
https://www.ncbi.nlm.nih.gov/pubmed/28125031
http://dx.doi.org/10.3390/microorganisms5010004
_version_ 1782518877927768064
author Choudhari, Sulbha
Grigoriev, Andrey
author_facet Choudhari, Sulbha
Grigoriev, Andrey
author_sort Choudhari, Sulbha
collection PubMed
description Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of resulting sequence reads. Here, we illustrate such biases using two methods. One is based on phylogenetic heatmaps (PGHMs), a novel approach for compact visualization of sequence composition differences between two groups of sequences containing the same phylogenetic groups. This method is well suited for finding noise and biases when comparing metagenomics samples. We apply PGHMs to detect noise and bias in the data produced with different DNA extraction protocols, different sequencing platforms and different experimental frameworks. In parallel, we use principal component analysis displaying different clustering of sequences from each sample to support our findings and illustrate the utility of PGHMs. We considered contributions of the read length and GC-content variation and observed that in most cases biases were generally due to the GC-content of the reads.
format Online
Article
Text
id pubmed-5374381
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-53743812017-04-10 Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads Choudhari, Sulbha Grigoriev, Andrey Microorganisms Article Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of resulting sequence reads. Here, we illustrate such biases using two methods. One is based on phylogenetic heatmaps (PGHMs), a novel approach for compact visualization of sequence composition differences between two groups of sequences containing the same phylogenetic groups. This method is well suited for finding noise and biases when comparing metagenomics samples. We apply PGHMs to detect noise and bias in the data produced with different DNA extraction protocols, different sequencing platforms and different experimental frameworks. In parallel, we use principal component analysis displaying different clustering of sequences from each sample to support our findings and illustrate the utility of PGHMs. We considered contributions of the read length and GC-content variation and observed that in most cases biases were generally due to the GC-content of the reads. MDPI 2017-01-24 /pmc/articles/PMC5374381/ /pubmed/28125031 http://dx.doi.org/10.3390/microorganisms5010004 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Choudhari, Sulbha
Grigoriev, Andrey
Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_full Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_fullStr Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_full_unstemmed Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_short Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
title_sort phylogenetic heatmaps highlight composition biases in sequenced reads
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374381/
https://www.ncbi.nlm.nih.gov/pubmed/28125031
http://dx.doi.org/10.3390/microorganisms5010004
work_keys_str_mv AT choudharisulbha phylogeneticheatmapshighlightcompositionbiasesinsequencedreads
AT grigorievandrey phylogeneticheatmapshighlightcompositionbiasesinsequencedreads