Cargando…

PCR-Induced Transitions Are the Major Source of Error in Cleaned Ultra-Deep Pyrosequencing Data

BACKGROUND: Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. This study investigated the characteristics and source of errors in raw and cleaned UDPS data. RESULTS: UDPS of...

Descripción completa

Detalles Bibliográficos
Autores principales: Brodin, Johanna, Mild, Mattias, Hedskog, Charlotte, Sherwood, Ellen, Leitner, Thomas, Andersson, Björn, Albert, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3720931/
https://www.ncbi.nlm.nih.gov/pubmed/23894647
http://dx.doi.org/10.1371/journal.pone.0070388
_version_ 1782278014954897408
author Brodin, Johanna
Mild, Mattias
Hedskog, Charlotte
Sherwood, Ellen
Leitner, Thomas
Andersson, Björn
Albert, Jan
author_facet Brodin, Johanna
Mild, Mattias
Hedskog, Charlotte
Sherwood, Ellen
Leitner, Thomas
Andersson, Björn
Albert, Jan
author_sort Brodin, Johanna
collection PubMed
description BACKGROUND: Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. This study investigated the characteristics and source of errors in raw and cleaned UDPS data. RESULTS: UDPS of a 167-nucleotide fragment of the HIV-1 SG3Δenv plasmid was performed on the Roche/454 platform. The plasmid was diluted to one copy, PCR amplified and subjected to bidirectional UDPS on three occasions. The dataset consisted of 47,693 UDPS reads. Raw UDPS data had an average error frequency of 0.30% per nucleotide site. Most errors were insertions and deletions in homopolymeric regions. We used a cleaning strategy that removed almost all indel errors, but had little effect on substitution errors, which reduced the error frequency to 0.056% per nucleotide. In cleaned data the error frequency was similar in homopolymeric and non-homopolymeric regions, but varied considerably across sites. These site-specific error frequencies were moderately, but still significantly, correlated between runs (r = 0.15–0.65) and between forward and reverse sequencing directions within runs (r = 0.33–0.65). Furthermore, transition errors were 48-times more common than transversion errors (0.052% vs. 0.001%; p<0.0001). Collectively the results indicate that a considerable proportion of the sequencing errors that remained after data cleaning were generated during the PCR that preceded UDPS. CONCLUSIONS: A majority of the sequencing errors that remained after data cleaning were introduced by PCR prior to sequencing, which means that they will be independent of platform used for next-generation sequencing. The transition vs. transversion error bias in cleaned UDPS data will influence the detection limits of rare mutations and sequence variants.
format Online
Article
Text
id pubmed-3720931
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37209312013-07-26 PCR-Induced Transitions Are the Major Source of Error in Cleaned Ultra-Deep Pyrosequencing Data Brodin, Johanna Mild, Mattias Hedskog, Charlotte Sherwood, Ellen Leitner, Thomas Andersson, Björn Albert, Jan PLoS One Research Article BACKGROUND: Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. This study investigated the characteristics and source of errors in raw and cleaned UDPS data. RESULTS: UDPS of a 167-nucleotide fragment of the HIV-1 SG3Δenv plasmid was performed on the Roche/454 platform. The plasmid was diluted to one copy, PCR amplified and subjected to bidirectional UDPS on three occasions. The dataset consisted of 47,693 UDPS reads. Raw UDPS data had an average error frequency of 0.30% per nucleotide site. Most errors were insertions and deletions in homopolymeric regions. We used a cleaning strategy that removed almost all indel errors, but had little effect on substitution errors, which reduced the error frequency to 0.056% per nucleotide. In cleaned data the error frequency was similar in homopolymeric and non-homopolymeric regions, but varied considerably across sites. These site-specific error frequencies were moderately, but still significantly, correlated between runs (r = 0.15–0.65) and between forward and reverse sequencing directions within runs (r = 0.33–0.65). Furthermore, transition errors were 48-times more common than transversion errors (0.052% vs. 0.001%; p<0.0001). Collectively the results indicate that a considerable proportion of the sequencing errors that remained after data cleaning were generated during the PCR that preceded UDPS. CONCLUSIONS: A majority of the sequencing errors that remained after data cleaning were introduced by PCR prior to sequencing, which means that they will be independent of platform used for next-generation sequencing. The transition vs. transversion error bias in cleaned UDPS data will influence the detection limits of rare mutations and sequence variants. Public Library of Science 2013-07-23 /pmc/articles/PMC3720931/ /pubmed/23894647 http://dx.doi.org/10.1371/journal.pone.0070388 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Brodin, Johanna
Mild, Mattias
Hedskog, Charlotte
Sherwood, Ellen
Leitner, Thomas
Andersson, Björn
Albert, Jan
PCR-Induced Transitions Are the Major Source of Error in Cleaned Ultra-Deep Pyrosequencing Data
title PCR-Induced Transitions Are the Major Source of Error in Cleaned Ultra-Deep Pyrosequencing Data
title_full PCR-Induced Transitions Are the Major Source of Error in Cleaned Ultra-Deep Pyrosequencing Data
title_fullStr PCR-Induced Transitions Are the Major Source of Error in Cleaned Ultra-Deep Pyrosequencing Data
title_full_unstemmed PCR-Induced Transitions Are the Major Source of Error in Cleaned Ultra-Deep Pyrosequencing Data
title_short PCR-Induced Transitions Are the Major Source of Error in Cleaned Ultra-Deep Pyrosequencing Data
title_sort pcr-induced transitions are the major source of error in cleaned ultra-deep pyrosequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3720931/
https://www.ncbi.nlm.nih.gov/pubmed/23894647
http://dx.doi.org/10.1371/journal.pone.0070388
work_keys_str_mv AT brodinjohanna pcrinducedtransitionsarethemajorsourceoferrorincleanedultradeeppyrosequencingdata
AT mildmattias pcrinducedtransitionsarethemajorsourceoferrorincleanedultradeeppyrosequencingdata
AT hedskogcharlotte pcrinducedtransitionsarethemajorsourceoferrorincleanedultradeeppyrosequencingdata
AT sherwoodellen pcrinducedtransitionsarethemajorsourceoferrorincleanedultradeeppyrosequencingdata
AT leitnerthomas pcrinducedtransitionsarethemajorsourceoferrorincleanedultradeeppyrosequencingdata
AT anderssonbjorn pcrinducedtransitionsarethemajorsourceoferrorincleanedultradeeppyrosequencingdata
AT albertjan pcrinducedtransitionsarethemajorsourceoferrorincleanedultradeeppyrosequencingdata