Cargando…

The effect of strand bias in Illumina short-read sequencing data

BACKGROUND: When using Illumina high throughput short read data, sometimes the genotype inferred from the positive strand and negative strand are significantly different, with one homozygous and the other heterozygous. This phenomenon is known as strand bias. In this study, we used Illumina short-re...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Yan, Li, Jiang, Li, Chung-I, Long, Jirong, Samuels, David C, Shyr, Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3532123/
https://www.ncbi.nlm.nih.gov/pubmed/23176052
http://dx.doi.org/10.1186/1471-2164-13-666
_version_ 1782254254191280128
author Guo, Yan
Li, Jiang
Li, Chung-I
Long, Jirong
Samuels, David C
Shyr, Yu
author_facet Guo, Yan
Li, Jiang
Li, Chung-I
Long, Jirong
Samuels, David C
Shyr, Yu
author_sort Guo, Yan
collection PubMed
description BACKGROUND: When using Illumina high throughput short read data, sometimes the genotype inferred from the positive strand and negative strand are significantly different, with one homozygous and the other heterozygous. This phenomenon is known as strand bias. In this study, we used Illumina short-read sequencing data to evaluate the effect of strand bias on genotyping quality, and to explore the possible causes of strand bias. RESULT: We collected 22 breast cancer samples from 22 patients and sequenced their exome using the Illumina GAIIx machine. By comparing the consistency between the genotypes inferred from this sequencing data with the genotypes inferred from SNP chip data, we found that, when using sequencing data, SNPs with extreme strand bias did not have significantly lower consistency rates compared to SNPs with low or no strand bias. However, this result may be limited by the small subset of SNPs present in both the exome sequencing and the SNP chip data. We further compared the transition and transversion ratio and the number of novel non-synonymous SNPs between the SNPs with low or no strand bias and those with extreme strand bias, and found that SNPs with low or no strand bias have better overall quality. We also discovered that the strand bias occurs randomly at genomic positions across these samples, and observed no consistent pattern of strand bias location across samples. By comparing results from two different aligners, BWA and Bowtie, we found very consistent strand bias patterns. Thus strand bias is unlikely to be caused by alignment artifacts. We successfully replicated our results using two additional independent datasets with different capturing methods and Illumina sequencers. CONCLUSION: Extreme strand bias indicates a potential high false-positive rate for SNPs.
format Online
Article
Text
id pubmed-3532123
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35321232013-01-03 The effect of strand bias in Illumina short-read sequencing data Guo, Yan Li, Jiang Li, Chung-I Long, Jirong Samuels, David C Shyr, Yu BMC Genomics Research Article BACKGROUND: When using Illumina high throughput short read data, sometimes the genotype inferred from the positive strand and negative strand are significantly different, with one homozygous and the other heterozygous. This phenomenon is known as strand bias. In this study, we used Illumina short-read sequencing data to evaluate the effect of strand bias on genotyping quality, and to explore the possible causes of strand bias. RESULT: We collected 22 breast cancer samples from 22 patients and sequenced their exome using the Illumina GAIIx machine. By comparing the consistency between the genotypes inferred from this sequencing data with the genotypes inferred from SNP chip data, we found that, when using sequencing data, SNPs with extreme strand bias did not have significantly lower consistency rates compared to SNPs with low or no strand bias. However, this result may be limited by the small subset of SNPs present in both the exome sequencing and the SNP chip data. We further compared the transition and transversion ratio and the number of novel non-synonymous SNPs between the SNPs with low or no strand bias and those with extreme strand bias, and found that SNPs with low or no strand bias have better overall quality. We also discovered that the strand bias occurs randomly at genomic positions across these samples, and observed no consistent pattern of strand bias location across samples. By comparing results from two different aligners, BWA and Bowtie, we found very consistent strand bias patterns. Thus strand bias is unlikely to be caused by alignment artifacts. We successfully replicated our results using two additional independent datasets with different capturing methods and Illumina sequencers. CONCLUSION: Extreme strand bias indicates a potential high false-positive rate for SNPs. BioMed Central 2012-11-24 /pmc/articles/PMC3532123/ /pubmed/23176052 http://dx.doi.org/10.1186/1471-2164-13-666 Text en Copyright ©2012 Guo et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Guo, Yan
Li, Jiang
Li, Chung-I
Long, Jirong
Samuels, David C
Shyr, Yu
The effect of strand bias in Illumina short-read sequencing data
title The effect of strand bias in Illumina short-read sequencing data
title_full The effect of strand bias in Illumina short-read sequencing data
title_fullStr The effect of strand bias in Illumina short-read sequencing data
title_full_unstemmed The effect of strand bias in Illumina short-read sequencing data
title_short The effect of strand bias in Illumina short-read sequencing data
title_sort effect of strand bias in illumina short-read sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3532123/
https://www.ncbi.nlm.nih.gov/pubmed/23176052
http://dx.doi.org/10.1186/1471-2164-13-666
work_keys_str_mv AT guoyan theeffectofstrandbiasinilluminashortreadsequencingdata
AT lijiang theeffectofstrandbiasinilluminashortreadsequencingdata
AT lichungi theeffectofstrandbiasinilluminashortreadsequencingdata
AT longjirong theeffectofstrandbiasinilluminashortreadsequencingdata
AT samuelsdavidc theeffectofstrandbiasinilluminashortreadsequencingdata
AT shyryu theeffectofstrandbiasinilluminashortreadsequencingdata
AT guoyan effectofstrandbiasinilluminashortreadsequencingdata
AT lijiang effectofstrandbiasinilluminashortreadsequencingdata
AT lichungi effectofstrandbiasinilluminashortreadsequencingdata
AT longjirong effectofstrandbiasinilluminashortreadsequencingdata
AT samuelsdavidc effectofstrandbiasinilluminashortreadsequencingdata
AT shyryu effectofstrandbiasinilluminashortreadsequencingdata