Cargando…
Highly accurate quantification of allelic gene expression for population and disease genetics
Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA seq...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9435737/ https://www.ncbi.nlm.nih.gov/pubmed/35794008 http://dx.doi.org/10.1101/gr.276296.121 |
_version_ | 1784781215225610240 |
---|---|
author | Saukkonen, Anna Kilpinen, Helena Hodgkinson, Alan |
author_facet | Saukkonen, Anna Kilpinen, Helena Hodgkinson, Alan |
author_sort | Saukkonen, Anna |
collection | PubMed |
description | Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the Personalised ASE Caller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by ∼80% and increasing the number of sites that can be reliably quantified by ∼3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell type–specific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes. |
format | Online Article Text |
id | pubmed-9435737 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-94357372022-09-16 Highly accurate quantification of allelic gene expression for population and disease genetics Saukkonen, Anna Kilpinen, Helena Hodgkinson, Alan Genome Res Method Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the Personalised ASE Caller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by ∼80% and increasing the number of sites that can be reliably quantified by ∼3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell type–specific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes. Cold Spring Harbor Laboratory Press 2022-08 /pmc/articles/PMC9435737/ /pubmed/35794008 http://dx.doi.org/10.1101/gr.276296.121 Text en © 2022 Saukkonen et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by/4.0/This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Method Saukkonen, Anna Kilpinen, Helena Hodgkinson, Alan Highly accurate quantification of allelic gene expression for population and disease genetics |
title | Highly accurate quantification of allelic gene expression for population and disease genetics |
title_full | Highly accurate quantification of allelic gene expression for population and disease genetics |
title_fullStr | Highly accurate quantification of allelic gene expression for population and disease genetics |
title_full_unstemmed | Highly accurate quantification of allelic gene expression for population and disease genetics |
title_short | Highly accurate quantification of allelic gene expression for population and disease genetics |
title_sort | highly accurate quantification of allelic gene expression for population and disease genetics |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9435737/ https://www.ncbi.nlm.nih.gov/pubmed/35794008 http://dx.doi.org/10.1101/gr.276296.121 |
work_keys_str_mv | AT saukkonenanna highlyaccuratequantificationofallelicgeneexpressionforpopulationanddiseasegenetics AT kilpinenhelena highlyaccuratequantificationofallelicgeneexpressionforpopulationanddiseasegenetics AT hodgkinsonalan highlyaccuratequantificationofallelicgeneexpressionforpopulationanddiseasegenetics |