Cargando…

Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells

BACKGROUND: Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and...

Descripción completa

Detalles Bibliográficos
Autores principales: Beltman, Joost B., Urbanus, Jos, Velds, Arno, van Rooij, Nienke, Rohr, Jan C., Naik, Shalin H., Schumacher, Ton N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4818877/
https://www.ncbi.nlm.nih.gov/pubmed/27038897
http://dx.doi.org/10.1186/s12859-016-0999-4
_version_ 1782425101269991424
author Beltman, Joost B.
Urbanus, Jos
Velds, Arno
van Rooij, Nienke
Rohr, Jan C.
Naik, Shalin H.
Schumacher, Ton N.
author_facet Beltman, Joost B.
Urbanus, Jos
Velds, Arno
van Rooij, Nienke
Rohr, Jan C.
Naik, Shalin H.
Schumacher, Ton N.
author_sort Beltman, Joost B.
collection PubMed
description BACKGROUND: Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. RESULTS: Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. CONCLUSIONS: Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0999-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4818877
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48188772016-04-04 Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells Beltman, Joost B. Urbanus, Jos Velds, Arno van Rooij, Nienke Rohr, Jan C. Naik, Shalin H. Schumacher, Ton N. BMC Bioinformatics Methodology Article BACKGROUND: Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. RESULTS: Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. CONCLUSIONS: Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0999-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-04-02 /pmc/articles/PMC4818877/ /pubmed/27038897 http://dx.doi.org/10.1186/s12859-016-0999-4 Text en © Beltman et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Beltman, Joost B.
Urbanus, Jos
Velds, Arno
van Rooij, Nienke
Rohr, Jan C.
Naik, Shalin H.
Schumacher, Ton N.
Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells
title Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells
title_full Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells
title_fullStr Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells
title_full_unstemmed Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells
title_short Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells
title_sort reproducibility of illumina platform deep sequencing errors allows accurate determination of dna barcodes in cells
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4818877/
https://www.ncbi.nlm.nih.gov/pubmed/27038897
http://dx.doi.org/10.1186/s12859-016-0999-4
work_keys_str_mv AT beltmanjoostb reproducibilityofilluminaplatformdeepsequencingerrorsallowsaccuratedeterminationofdnabarcodesincells
AT urbanusjos reproducibilityofilluminaplatformdeepsequencingerrorsallowsaccuratedeterminationofdnabarcodesincells
AT veldsarno reproducibilityofilluminaplatformdeepsequencingerrorsallowsaccuratedeterminationofdnabarcodesincells
AT vanrooijnienke reproducibilityofilluminaplatformdeepsequencingerrorsallowsaccuratedeterminationofdnabarcodesincells
AT rohrjanc reproducibilityofilluminaplatformdeepsequencingerrorsallowsaccuratedeterminationofdnabarcodesincells
AT naikshalinh reproducibilityofilluminaplatformdeepsequencingerrorsallowsaccuratedeterminationofdnabarcodesincells
AT schumachertonn reproducibilityofilluminaplatformdeepsequencingerrorsallowsaccuratedeterminationofdnabarcodesincells