Cargando…

Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data

BACKGROUND: Illumina’s sequencing platforms are currently the most utilised sequencing systems worldwide. The technology has rapidly evolved over recent years and provides high throughput at low costs with increasing read-lengths and true paired-end reads. However, data from any sequencing technolog...

Descripción completa

Detalles Bibliográficos
Autores principales: Schirmer, Melanie, D’Amore, Rosalinda, Ijaz, Umer Z., Hall, Neil, Quince, Christopher
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4787001/
https://www.ncbi.nlm.nih.gov/pubmed/26968756
http://dx.doi.org/10.1186/s12859-016-0976-y
_version_ 1782420638343888896
author Schirmer, Melanie
D’Amore, Rosalinda
Ijaz, Umer Z.
Hall, Neil
Quince, Christopher
author_facet Schirmer, Melanie
D’Amore, Rosalinda
Ijaz, Umer Z.
Hall, Neil
Quince, Christopher
author_sort Schirmer, Melanie
collection PubMed
description BACKGROUND: Illumina’s sequencing platforms are currently the most utilised sequencing systems worldwide. The technology has rapidly evolved over recent years and provides high throughput at low costs with increasing read-lengths and true paired-end reads. However, data from any sequencing technology contains noise and our understanding of the peculiarities and sequencing errors encountered in Illumina data has lagged behind this rapid development. RESULTS: We conducted a systematic investigation of errors and biases in Illumina data based on the largest collection of in vitro metagenomic data sets to date. We evaluated the Genome Analyzer II, HiSeq and MiSeq and tested state-of-the-art low input library preparation methods. Analysing in vitro metagenomic sequencing data allowed us to determine biases directly associated with the actual sequencing process. The position- and nucleotide-specific analysis revealed a substantial bias related to motifs (3mers preceding errors) ending in “GG”. On average the top three motifs were linked to 16 % of all substitution errors. Furthermore, a preferential incorporation of ddGTPs was recorded. We hypothesise that all of these biases are related to the engineered polymerase and ddNTPs which are intrinsic to any sequencing-by-synthesis method. We show that quality-score-based error removal strategies can on average remove 69 % of the substitution errors - however, the motif-bias remains. CONCLUSION: Single-nucleotide polymorphism changes in bacterial genomes can cause significant changes in phenotype, including antibiotic resistance and virulence, detecting them within metagenomes is therefore vital. Current error removal techniques are not designed to target the peculiarities encountered in Illumina sequencing data and other sequencing-by-synthesis methods, causing biases to persist and potentially affect any conclusions drawn from the data. In order to develop effective diagnostic and therapeutic approaches we need to be able to identify systematic sequencing errors and distinguish these errors from true genetic variation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0976-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4787001
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47870012016-03-12 Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data Schirmer, Melanie D’Amore, Rosalinda Ijaz, Umer Z. Hall, Neil Quince, Christopher BMC Bioinformatics Research Article BACKGROUND: Illumina’s sequencing platforms are currently the most utilised sequencing systems worldwide. The technology has rapidly evolved over recent years and provides high throughput at low costs with increasing read-lengths and true paired-end reads. However, data from any sequencing technology contains noise and our understanding of the peculiarities and sequencing errors encountered in Illumina data has lagged behind this rapid development. RESULTS: We conducted a systematic investigation of errors and biases in Illumina data based on the largest collection of in vitro metagenomic data sets to date. We evaluated the Genome Analyzer II, HiSeq and MiSeq and tested state-of-the-art low input library preparation methods. Analysing in vitro metagenomic sequencing data allowed us to determine biases directly associated with the actual sequencing process. The position- and nucleotide-specific analysis revealed a substantial bias related to motifs (3mers preceding errors) ending in “GG”. On average the top three motifs were linked to 16 % of all substitution errors. Furthermore, a preferential incorporation of ddGTPs was recorded. We hypothesise that all of these biases are related to the engineered polymerase and ddNTPs which are intrinsic to any sequencing-by-synthesis method. We show that quality-score-based error removal strategies can on average remove 69 % of the substitution errors - however, the motif-bias remains. CONCLUSION: Single-nucleotide polymorphism changes in bacterial genomes can cause significant changes in phenotype, including antibiotic resistance and virulence, detecting them within metagenomes is therefore vital. Current error removal techniques are not designed to target the peculiarities encountered in Illumina sequencing data and other sequencing-by-synthesis methods, causing biases to persist and potentially affect any conclusions drawn from the data. In order to develop effective diagnostic and therapeutic approaches we need to be able to identify systematic sequencing errors and distinguish these errors from true genetic variation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0976-y) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-11 /pmc/articles/PMC4787001/ /pubmed/26968756 http://dx.doi.org/10.1186/s12859-016-0976-y Text en © Schirmer et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Schirmer, Melanie
D’Amore, Rosalinda
Ijaz, Umer Z.
Hall, Neil
Quince, Christopher
Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data
title Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data
title_full Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data
title_fullStr Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data
title_full_unstemmed Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data
title_short Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data
title_sort illumina error profiles: resolving fine-scale variation in metagenomic sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4787001/
https://www.ncbi.nlm.nih.gov/pubmed/26968756
http://dx.doi.org/10.1186/s12859-016-0976-y
work_keys_str_mv AT schirmermelanie illuminaerrorprofilesresolvingfinescalevariationinmetagenomicsequencingdata
AT damorerosalinda illuminaerrorprofilesresolvingfinescalevariationinmetagenomicsequencingdata
AT ijazumerz illuminaerrorprofilesresolvingfinescalevariationinmetagenomicsequencingdata
AT hallneil illuminaerrorprofilesresolvingfinescalevariationinmetagenomicsequencingdata
AT quincechristopher illuminaerrorprofilesresolvingfinescalevariationinmetagenomicsequencingdata