Cargando…
Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads
Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequ...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374381/ https://www.ncbi.nlm.nih.gov/pubmed/28125031 http://dx.doi.org/10.3390/microorganisms5010004 |
_version_ | 1782518877927768064 |
---|---|
author | Choudhari, Sulbha Grigoriev, Andrey |
author_facet | Choudhari, Sulbha Grigoriev, Andrey |
author_sort | Choudhari, Sulbha |
collection | PubMed |
description | Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of resulting sequence reads. Here, we illustrate such biases using two methods. One is based on phylogenetic heatmaps (PGHMs), a novel approach for compact visualization of sequence composition differences between two groups of sequences containing the same phylogenetic groups. This method is well suited for finding noise and biases when comparing metagenomics samples. We apply PGHMs to detect noise and bias in the data produced with different DNA extraction protocols, different sequencing platforms and different experimental frameworks. In parallel, we use principal component analysis displaying different clustering of sequences from each sample to support our findings and illustrate the utility of PGHMs. We considered contributions of the read length and GC-content variation and observed that in most cases biases were generally due to the GC-content of the reads. |
format | Online Article Text |
id | pubmed-5374381 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-53743812017-04-10 Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads Choudhari, Sulbha Grigoriev, Andrey Microorganisms Article Due to advancements in sequencing technology, sequence data production is no longer a constraint in the field of microbiology and has made it possible to study uncultured microbes or whole environments using metagenomics. However, these new technologies introduce different biases in metagenomic sequencing, affecting the nucleotide distribution of resulting sequence reads. Here, we illustrate such biases using two methods. One is based on phylogenetic heatmaps (PGHMs), a novel approach for compact visualization of sequence composition differences between two groups of sequences containing the same phylogenetic groups. This method is well suited for finding noise and biases when comparing metagenomics samples. We apply PGHMs to detect noise and bias in the data produced with different DNA extraction protocols, different sequencing platforms and different experimental frameworks. In parallel, we use principal component analysis displaying different clustering of sequences from each sample to support our findings and illustrate the utility of PGHMs. We considered contributions of the read length and GC-content variation and observed that in most cases biases were generally due to the GC-content of the reads. MDPI 2017-01-24 /pmc/articles/PMC5374381/ /pubmed/28125031 http://dx.doi.org/10.3390/microorganisms5010004 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Choudhari, Sulbha Grigoriev, Andrey Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads |
title | Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads |
title_full | Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads |
title_fullStr | Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads |
title_full_unstemmed | Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads |
title_short | Phylogenetic Heatmaps Highlight Composition Biases in Sequenced Reads |
title_sort | phylogenetic heatmaps highlight composition biases in sequenced reads |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374381/ https://www.ncbi.nlm.nih.gov/pubmed/28125031 http://dx.doi.org/10.3390/microorganisms5010004 |
work_keys_str_mv | AT choudharisulbha phylogeneticheatmapshighlightcompositionbiasesinsequencedreads AT grigorievandrey phylogeneticheatmapshighlightcompositionbiasesinsequencedreads |