Cargando…

Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome

BACKGROUND: RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate t...

Descripción completa

Detalles Bibliográficos
Autores principales: Stevenson, Kraig R, Coolon, Joseph D, Wittkopp, Patricia J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751238/
https://www.ncbi.nlm.nih.gov/pubmed/23919664
http://dx.doi.org/10.1186/1471-2164-14-536
_version_ 1782281558996025344
author Stevenson, Kraig R
Coolon, Joseph D
Wittkopp, Patricia J
author_facet Stevenson, Kraig R
Coolon, Joseph D
Wittkopp, Patricia J
author_sort Stevenson, Kraig R
collection PubMed
description BACKGROUND: RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE. RESULTS: We found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles. CONCLUSIONS: After aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes.
format Online
Article
Text
id pubmed-3751238
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37512382013-08-24 Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome Stevenson, Kraig R Coolon, Joseph D Wittkopp, Patricia J BMC Genomics Methodology Article BACKGROUND: RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE. RESULTS: We found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles. CONCLUSIONS: After aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes. BioMed Central 2013-08-07 /pmc/articles/PMC3751238/ /pubmed/23919664 http://dx.doi.org/10.1186/1471-2164-14-536 Text en Copyright © 2013 Stevenson et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Stevenson, Kraig R
Coolon, Joseph D
Wittkopp, Patricia J
Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome
title Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome
title_full Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome
title_fullStr Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome
title_full_unstemmed Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome
title_short Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome
title_sort sources of bias in measures of allele-specific expression derived from rna-seq data aligned to a single reference genome
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751238/
https://www.ncbi.nlm.nih.gov/pubmed/23919664
http://dx.doi.org/10.1186/1471-2164-14-536
work_keys_str_mv AT stevensonkraigr sourcesofbiasinmeasuresofallelespecificexpressionderivedfromrnaseqdataalignedtoasinglereferencegenome
AT coolonjosephd sourcesofbiasinmeasuresofallelespecificexpressionderivedfromrnaseqdataalignedtoasinglereferencegenome
AT wittkopppatriciaj sourcesofbiasinmeasuresofallelespecificexpressionderivedfromrnaseqdataalignedtoasinglereferencegenome