Cargando…

Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence

Barcoded vectors are promising tools for investigating clonal diversity and dynamics in hematopoietic gene therapy. Analysis of clones marked with barcoded vectors requires accurate identification of potentially large numbers of individually rare barcodes, when the exact number, sequence identity an...

Descripción completa

Detalles Bibliográficos
Autores principales: Deakin, Claire T., Deakin, Jeffrey J., Ginn, Samantha L., Young, Paul, Humphreys, David, Suter, Catherine M., Alexander, Ian E., Hallwirth, Claus V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4176369/
https://www.ncbi.nlm.nih.gov/pubmed/25013183
http://dx.doi.org/10.1093/nar/gku607
_version_ 1782336621562036224
author Deakin, Claire T.
Deakin, Jeffrey J.
Ginn, Samantha L.
Young, Paul
Humphreys, David
Suter, Catherine M.
Alexander, Ian E.
Hallwirth, Claus V.
author_facet Deakin, Claire T.
Deakin, Jeffrey J.
Ginn, Samantha L.
Young, Paul
Humphreys, David
Suter, Catherine M.
Alexander, Ian E.
Hallwirth, Claus V.
author_sort Deakin, Claire T.
collection PubMed
description Barcoded vectors are promising tools for investigating clonal diversity and dynamics in hematopoietic gene therapy. Analysis of clones marked with barcoded vectors requires accurate identification of potentially large numbers of individually rare barcodes, when the exact number, sequence identity and abundance are unknown. This is an inherently challenging application, and the feasibility of using contemporary next-generation sequencing technologies is unresolved. To explore this potential application empirically, without prior assumptions, we sequenced barcode libraries of known complexity. Libraries containing 1, 10 and 100 Sanger-sequenced barcodes were sequenced using an Illumina platform, with a 100-barcode library also sequenced using a SOLiD platform. Libraries containing 1 and 10 barcodes were distinguished from false barcodes generated by sequencing error by a several log-fold difference in abundance. In 100-barcode libraries, however, expected and false barcodes overlapped and could not be resolved by bioinformatic filtering and clustering strategies. In independent sequencing runs multiple false-positive barcodes appeared to be represented at higher abundance than known barcodes, despite their confirmed absence from the original library. Such errors, which potentially impact barcoding studies in an application-dependent manner, are consistent with the existence of both stochastic and systematic error, the mechanism of which is yet to be fully resolved.
format Online
Article
Text
id pubmed-4176369
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-41763692014-12-01 Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence Deakin, Claire T. Deakin, Jeffrey J. Ginn, Samantha L. Young, Paul Humphreys, David Suter, Catherine M. Alexander, Ian E. Hallwirth, Claus V. Nucleic Acids Res Methods Online Barcoded vectors are promising tools for investigating clonal diversity and dynamics in hematopoietic gene therapy. Analysis of clones marked with barcoded vectors requires accurate identification of potentially large numbers of individually rare barcodes, when the exact number, sequence identity and abundance are unknown. This is an inherently challenging application, and the feasibility of using contemporary next-generation sequencing technologies is unresolved. To explore this potential application empirically, without prior assumptions, we sequenced barcode libraries of known complexity. Libraries containing 1, 10 and 100 Sanger-sequenced barcodes were sequenced using an Illumina platform, with a 100-barcode library also sequenced using a SOLiD platform. Libraries containing 1 and 10 barcodes were distinguished from false barcodes generated by sequencing error by a several log-fold difference in abundance. In 100-barcode libraries, however, expected and false barcodes overlapped and could not be resolved by bioinformatic filtering and clustering strategies. In independent sequencing runs multiple false-positive barcodes appeared to be represented at higher abundance than known barcodes, despite their confirmed absence from the original library. Such errors, which potentially impact barcoding studies in an application-dependent manner, are consistent with the existence of both stochastic and systematic error, the mechanism of which is yet to be fully resolved. Oxford University Press 2014-09-15 2014-07-10 /pmc/articles/PMC4176369/ /pubmed/25013183 http://dx.doi.org/10.1093/nar/gku607 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Deakin, Claire T.
Deakin, Jeffrey J.
Ginn, Samantha L.
Young, Paul
Humphreys, David
Suter, Catherine M.
Alexander, Ian E.
Hallwirth, Claus V.
Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence
title Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence
title_full Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence
title_fullStr Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence
title_full_unstemmed Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence
title_short Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence
title_sort impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4176369/
https://www.ncbi.nlm.nih.gov/pubmed/25013183
http://dx.doi.org/10.1093/nar/gku607
work_keys_str_mv AT deakinclairet impactofnextgenerationsequencingerroronanalysisofbarcodedplasmidlibrariesofknowncomplexityandsequence
AT deakinjeffreyj impactofnextgenerationsequencingerroronanalysisofbarcodedplasmidlibrariesofknowncomplexityandsequence
AT ginnsamanthal impactofnextgenerationsequencingerroronanalysisofbarcodedplasmidlibrariesofknowncomplexityandsequence
AT youngpaul impactofnextgenerationsequencingerroronanalysisofbarcodedplasmidlibrariesofknowncomplexityandsequence
AT humphreysdavid impactofnextgenerationsequencingerroronanalysisofbarcodedplasmidlibrariesofknowncomplexityandsequence
AT sutercatherinem impactofnextgenerationsequencingerroronanalysisofbarcodedplasmidlibrariesofknowncomplexityandsequence
AT alexanderiane impactofnextgenerationsequencingerroronanalysisofbarcodedplasmidlibrariesofknowncomplexityandsequence
AT hallwirthclausv impactofnextgenerationsequencingerroronanalysisofbarcodedplasmidlibrariesofknowncomplexityandsequence