Cargando…

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on...

Descripción completa

Detalles Bibliográficos
Autores principales: Degner, Jacob F., Marioni, John C., Pai, Athma A., Pickrell, Joseph K., Nkadori, Everlyne, Gilad, Yoav, Pritchard, Jonathan K.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788925/
https://www.ncbi.nlm.nih.gov/pubmed/19808877
http://dx.doi.org/10.1093/bioinformatics/btp579
_version_ 1782175014889455616
author Degner, Jacob F.
Marioni, John C.
Pai, Athma A.
Pickrell, Joseph K.
Nkadori, Everlyne
Gilad, Yoav
Pritchard, Jonathan K.
author_facet Degner, Jacob F.
Marioni, John C.
Pai, Athma A.
Pickrell, Joseph K.
Nkadori, Everlyne
Gilad, Yoav
Pritchard, Jonathan K.
author_sort Degner, Jacob F.
collection PubMed
description Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156. Contact: jdegner@uchicago.edu; marioni@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2788925
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27889252009-12-07 Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data Degner, Jacob F. Marioni, John C. Pai, Athma A. Pickrell, Joseph K. Nkadori, Everlyne Gilad, Yoav Pritchard, Jonathan K. Bioinformatics Original Papers Motivation: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). Results: We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias toward higher mapping rates of the allele in the reference sequence, compared with the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, ∼5–10% of SNPs still have an inherent bias toward more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Availability: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156. Contact: jdegner@uchicago.edu; marioni@uchicago.edu; gilad@uchicago.edu; pritch@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2009-12-15 2009-10-06 /pmc/articles/PMC2788925/ /pubmed/19808877 http://dx.doi.org/10.1093/bioinformatics/btp579 Text en http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Degner, Jacob F.
Marioni, John C.
Pai, Athma A.
Pickrell, Joseph K.
Nkadori, Everlyne
Gilad, Yoav
Pritchard, Jonathan K.
Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
title Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
title_full Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
title_fullStr Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
title_full_unstemmed Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
title_short Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
title_sort effect of read-mapping biases on detecting allele-specific expression from rna-sequencing data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2788925/
https://www.ncbi.nlm.nih.gov/pubmed/19808877
http://dx.doi.org/10.1093/bioinformatics/btp579
work_keys_str_mv AT degnerjacobf effectofreadmappingbiasesondetectingallelespecificexpressionfromrnasequencingdata
AT marionijohnc effectofreadmappingbiasesondetectingallelespecificexpressionfromrnasequencingdata
AT paiathmaa effectofreadmappingbiasesondetectingallelespecificexpressionfromrnasequencingdata
AT pickrelljosephk effectofreadmappingbiasesondetectingallelespecificexpressionfromrnasequencingdata
AT nkadorieverlyne effectofreadmappingbiasesondetectingallelespecificexpressionfromrnasequencingdata
AT giladyoav effectofreadmappingbiasesondetectingallelespecificexpressionfromrnasequencingdata
AT pritchardjonathank effectofreadmappingbiasesondetectingallelespecificexpressionfromrnasequencingdata