Cargando…

GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms

BACKGROUND: Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundanc...

Descripción completa

Detalles Bibliográficos
Autores principales: Browne, Patrick Denis, Nielsen, Tue Kjærgaard, Kot, Witold, Aggerholm, Anni, Gilbert, M Thomas P, Puetz, Lara, Rasmussen, Morten, Zervas, Athanasios, Hansen, Lars Hestbjerg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7016772/
https://www.ncbi.nlm.nih.gov/pubmed/32052832
http://dx.doi.org/10.1093/gigascience/giaa008
_version_ 1783497052115697664
author Browne, Patrick Denis
Nielsen, Tue Kjærgaard
Kot, Witold
Aggerholm, Anni
Gilbert, M Thomas P
Puetz, Lara
Rasmussen, Morten
Zervas, Athanasios
Hansen, Lars Hestbjerg
author_facet Browne, Patrick Denis
Nielsen, Tue Kjærgaard
Kot, Witold
Aggerholm, Anni
Gilbert, M Thomas P
Puetz, Lara
Rasmussen, Morten
Zervas, Athanasios
Hansen, Lars Hestbjerg
author_sort Browne, Patrick Denis
collection PubMed
description BACKGROUND: Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic guanine-cytosine (GC) contents. RESULTS: We explored such GC biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows using MiSeq and NextSeq were hindered by major GC biases, with problems becoming increasingly severe outside the 45–65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had >10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC biases to each other, which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted by GC bias. CONCLUSIONS: These findings indicate potential sources of difficulty, arising from GC biases, in genome sequencing that could be pre-emptively addressed with methodological optimizations provided that the GC biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach be taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC bias before drawing conclusions, or they should use a demonstrably unbiased workflow.
format Online
Article
Text
id pubmed-7016772
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-70167722020-02-18 GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms Browne, Patrick Denis Nielsen, Tue Kjærgaard Kot, Witold Aggerholm, Anni Gilbert, M Thomas P Puetz, Lara Rasmussen, Morten Zervas, Athanasios Hansen, Lars Hestbjerg Gigascience Research BACKGROUND: Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic guanine-cytosine (GC) contents. RESULTS: We explored such GC biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows using MiSeq and NextSeq were hindered by major GC biases, with problems becoming increasingly severe outside the 45–65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had >10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC biases to each other, which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted by GC bias. CONCLUSIONS: These findings indicate potential sources of difficulty, arising from GC biases, in genome sequencing that could be pre-emptively addressed with methodological optimizations provided that the GC biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach be taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC bias before drawing conclusions, or they should use a demonstrably unbiased workflow. Oxford University Press 2020-02-13 /pmc/articles/PMC7016772/ /pubmed/32052832 http://dx.doi.org/10.1093/gigascience/giaa008 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Browne, Patrick Denis
Nielsen, Tue Kjærgaard
Kot, Witold
Aggerholm, Anni
Gilbert, M Thomas P
Puetz, Lara
Rasmussen, Morten
Zervas, Athanasios
Hansen, Lars Hestbjerg
GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms
title GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms
title_full GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms
title_fullStr GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms
title_full_unstemmed GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms
title_short GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms
title_sort gc bias affects genomic and metagenomic reconstructions, underrepresenting gc-poor organisms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7016772/
https://www.ncbi.nlm.nih.gov/pubmed/32052832
http://dx.doi.org/10.1093/gigascience/giaa008
work_keys_str_mv AT brownepatrickdenis gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms
AT nielsentuekjærgaard gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms
AT kotwitold gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms
AT aggerholmanni gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms
AT gilbertmthomasp gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms
AT puetzlara gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms
AT rasmussenmorten gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms
AT zervasathanasios gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms
AT hansenlarshestbjerg gcbiasaffectsgenomicandmetagenomicreconstructionsunderrepresentinggcpoororganisms