Cargando…

Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire

BACKGROUND: Recent advances in massively parallel sequencing have increased the depth at which T cell receptor (TCR) repertoires can be probed by >3log10, allowing for saturation sequencing of immune repertoires. The resolution of this sequencing is dependent on its accuracy, and direct assessmen...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Phuong, Ma, Jing, Pei, Deqing, Obert, Caroline, Cheng, Cheng, Geiger, Terrence L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045962/
https://www.ncbi.nlm.nih.gov/pubmed/21310087
http://dx.doi.org/10.1186/1471-2164-12-106
_version_ 1782198895321808896
author Nguyen, Phuong
Ma, Jing
Pei, Deqing
Obert, Caroline
Cheng, Cheng
Geiger, Terrence L
author_facet Nguyen, Phuong
Ma, Jing
Pei, Deqing
Obert, Caroline
Cheng, Cheng
Geiger, Terrence L
author_sort Nguyen, Phuong
collection PubMed
description BACKGROUND: Recent advances in massively parallel sequencing have increased the depth at which T cell receptor (TCR) repertoires can be probed by >3log10, allowing for saturation sequencing of immune repertoires. The resolution of this sequencing is dependent on its accuracy, and direct assessments of the errors formed during high throughput repertoire analyses are limited. RESULTS: We analyzed 3 monoclonal TCR from TCR transgenic, Rag(-/- )mice using Illumina(® )sequencing. A total of 27 sequencing reactions were performed for each TCR using a trifurcating design in which samples were divided into 3 at significant processing junctures. More than 20 million complementarity determining region (CDR) 3 sequences were analyzed. Filtering for lower quality sequences diminished but did not eliminate sequence errors, which occurred within 1-6% of sequences. Erroneous sequences were pre-dominantly of correct length and contained single nucleotide substitutions. Rates of specific substitutions varied dramatically in a position-dependent manner. Four substitutions, all purine-pyrimidine transversions, predominated. Solid phase amplification and sequencing rather than liquid sample amplification and preparation appeared to be the primary sources of error. Analysis of polyclonal repertoires demonstrated the impact of error accumulation on data parameters. CONCLUSIONS: Caution is needed in interpreting repertoire data due to potential contamination with mis-sequence reads. However, a high association of errors with phred score, high relatedness of erroneous sequences with the parental sequence, dominance of specific nt substitutions, and skewed ratio of forward to reverse reads among erroneous sequences indicate approaches to filter erroneous sequences from repertoire data sets.
format Text
id pubmed-3045962
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30459622011-03-01 Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire Nguyen, Phuong Ma, Jing Pei, Deqing Obert, Caroline Cheng, Cheng Geiger, Terrence L BMC Genomics Research Article BACKGROUND: Recent advances in massively parallel sequencing have increased the depth at which T cell receptor (TCR) repertoires can be probed by >3log10, allowing for saturation sequencing of immune repertoires. The resolution of this sequencing is dependent on its accuracy, and direct assessments of the errors formed during high throughput repertoire analyses are limited. RESULTS: We analyzed 3 monoclonal TCR from TCR transgenic, Rag(-/- )mice using Illumina(® )sequencing. A total of 27 sequencing reactions were performed for each TCR using a trifurcating design in which samples were divided into 3 at significant processing junctures. More than 20 million complementarity determining region (CDR) 3 sequences were analyzed. Filtering for lower quality sequences diminished but did not eliminate sequence errors, which occurred within 1-6% of sequences. Erroneous sequences were pre-dominantly of correct length and contained single nucleotide substitutions. Rates of specific substitutions varied dramatically in a position-dependent manner. Four substitutions, all purine-pyrimidine transversions, predominated. Solid phase amplification and sequencing rather than liquid sample amplification and preparation appeared to be the primary sources of error. Analysis of polyclonal repertoires demonstrated the impact of error accumulation on data parameters. CONCLUSIONS: Caution is needed in interpreting repertoire data due to potential contamination with mis-sequence reads. However, a high association of errors with phred score, high relatedness of erroneous sequences with the parental sequence, dominance of specific nt substitutions, and skewed ratio of forward to reverse reads among erroneous sequences indicate approaches to filter erroneous sequences from repertoire data sets. BioMed Central 2011-02-11 /pmc/articles/PMC3045962/ /pubmed/21310087 http://dx.doi.org/10.1186/1471-2164-12-106 Text en Copyright ©2011 Nguyen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nguyen, Phuong
Ma, Jing
Pei, Deqing
Obert, Caroline
Cheng, Cheng
Geiger, Terrence L
Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire
title Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire
title_full Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire
title_fullStr Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire
title_full_unstemmed Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire
title_short Identification of errors introduced during high throughput sequencing of the T cell receptor repertoire
title_sort identification of errors introduced during high throughput sequencing of the t cell receptor repertoire
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045962/
https://www.ncbi.nlm.nih.gov/pubmed/21310087
http://dx.doi.org/10.1186/1471-2164-12-106
work_keys_str_mv AT nguyenphuong identificationoferrorsintroducedduringhighthroughputsequencingofthetcellreceptorrepertoire
AT majing identificationoferrorsintroducedduringhighthroughputsequencingofthetcellreceptorrepertoire
AT peideqing identificationoferrorsintroducedduringhighthroughputsequencingofthetcellreceptorrepertoire
AT obertcaroline identificationoferrorsintroducedduringhighthroughputsequencingofthetcellreceptorrepertoire
AT chengcheng identificationoferrorsintroducedduringhighthroughputsequencingofthetcellreceptorrepertoire
AT geigerterrencel identificationoferrorsintroducedduringhighthroughputsequencingofthetcellreceptorrepertoire