Cargando…
Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome
BACKGROUND: RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate t...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751238/ https://www.ncbi.nlm.nih.gov/pubmed/23919664 http://dx.doi.org/10.1186/1471-2164-14-536 |
_version_ | 1782281558996025344 |
---|---|
author | Stevenson, Kraig R Coolon, Joseph D Wittkopp, Patricia J |
author_facet | Stevenson, Kraig R Coolon, Joseph D Wittkopp, Patricia J |
author_sort | Stevenson, Kraig R |
collection | PubMed |
description | BACKGROUND: RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE. RESULTS: We found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles. CONCLUSIONS: After aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes. |
format | Online Article Text |
id | pubmed-3751238 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-37512382013-08-24 Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome Stevenson, Kraig R Coolon, Joseph D Wittkopp, Patricia J BMC Genomics Methodology Article BACKGROUND: RNA-seq can be used to measure allele-specific expression (ASE) by assigning sequence reads to individual alleles; however, relative ASE is systematically biased when sequence reads are aligned to a single reference genome. Aligning sequence reads to both parental genomes can eliminate this bias, but this approach is not always practical, especially for non-model organisms. To improve accuracy of ASE measured using a single reference genome, we identified properties of differentiating sites responsible for biased measures of relative ASE. RESULTS: We found that clusters of differentiating sites prevented sequence reads from an alternate allele from aligning to the reference genome, causing a bias in relative ASE favoring the reference allele. This bias increased with greater sequence divergence between alleles. Increasing the number of mismatches allowed when aligning sequence reads to the reference genome and restricting analysis to genomic regions with fewer differentiating sites than the number of mismatches allowed almost completely eliminated this systematic bias. Accuracy of allelic abundance was increased further by excluding differentiating sites within sequence reads that could not be aligned uniquely within the genome (imperfect mappability) and reads that overlapped one or more insertions or deletions (indels) between alleles. CONCLUSIONS: After aligning sequence reads to a single reference genome, excluding differentiating sites with at least as many neighboring differentiating sites as the number of mismatches allowed, imperfect mappability, and/or an indel(s) nearby resulted in measures of allelic abundance comparable to those derived from aligning sequence reads to both parental genomes. BioMed Central 2013-08-07 /pmc/articles/PMC3751238/ /pubmed/23919664 http://dx.doi.org/10.1186/1471-2164-14-536 Text en Copyright © 2013 Stevenson et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Stevenson, Kraig R Coolon, Joseph D Wittkopp, Patricia J Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome |
title | Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome |
title_full | Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome |
title_fullStr | Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome |
title_full_unstemmed | Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome |
title_short | Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome |
title_sort | sources of bias in measures of allele-specific expression derived from rna-seq data aligned to a single reference genome |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3751238/ https://www.ncbi.nlm.nih.gov/pubmed/23919664 http://dx.doi.org/10.1186/1471-2164-14-536 |
work_keys_str_mv | AT stevensonkraigr sourcesofbiasinmeasuresofallelespecificexpressionderivedfromrnaseqdataalignedtoasinglereferencegenome AT coolonjosephd sourcesofbiasinmeasuresofallelespecificexpressionderivedfromrnaseqdataalignedtoasinglereferencegenome AT wittkopppatriciaj sourcesofbiasinmeasuresofallelespecificexpressionderivedfromrnaseqdataalignedtoasinglereferencegenome |