Cargando…

Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis

When profiling blood samples by RNA-sequencing (RNA-seq), RNA from haemoglobin (Hgb) can account for up to 70% of the transcriptome. Due to considerations of sequencing depth and power to detect biological variation, Hgb RNA is typically depleted prior to sequencing by hybridisation-based methods; a...

Descripción completa

Detalles Bibliográficos
Autores principales: Sheerin, Dylan, Lakay, Francisco, Esmail, Hanif, Kinnear, Craig, Sansom, Bianca, Glanzmann, Brigitte, Wilkinson, Robert J., Ritchie, Matthew E., Coussens, Anna K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892020/
https://www.ncbi.nlm.nih.gov/pubmed/36725870
http://dx.doi.org/10.1038/s41598-023-28218-7
_version_ 1784881260504547328
author Sheerin, Dylan
Lakay, Francisco
Esmail, Hanif
Kinnear, Craig
Sansom, Bianca
Glanzmann, Brigitte
Wilkinson, Robert J.
Ritchie, Matthew E.
Coussens, Anna K.
author_facet Sheerin, Dylan
Lakay, Francisco
Esmail, Hanif
Kinnear, Craig
Sansom, Bianca
Glanzmann, Brigitte
Wilkinson, Robert J.
Ritchie, Matthew E.
Coussens, Anna K.
author_sort Sheerin, Dylan
collection PubMed
description When profiling blood samples by RNA-sequencing (RNA-seq), RNA from haemoglobin (Hgb) can account for up to 70% of the transcriptome. Due to considerations of sequencing depth and power to detect biological variation, Hgb RNA is typically depleted prior to sequencing by hybridisation-based methods; an alternative approach is to deplete reads arising from Hgb RNA bioinformatically. In the present study, we compared the impact of these two approaches on the outcome of differential gene expression analysis performed using RNA-seq data from 58 human tuberculosis (TB) patient or contact whole blood samples–29 globin kit-depleted and 29 matched non-depleted—a subset of which were taken at TB diagnosis and at six months post-TB treatment from the same patient. Bioinformatic depletion of Hgb genes from the non-depleted samples (bioinformatic-depleted) substantially reduced library sizes (median = 57.24%) and fewer long non-coding, micro, small nuclear and small nucleolar RNAs were captured in these libraries. Profiling published TB gene signatures across all samples revealed inferior correlation between kit-depleted and bioinformatic-depleted pairs when the proportion of reads mapping to Hgb genes was higher in the non-depleted sample, particularly at the TB diagnosis time point. A set of putative “globin-fingerprint” genes were identified by directly comparing kit-depleted and bioinformatic-depleted samples at each timepoint. Two TB treatment response signatures were also shown to have decreased differential performance when comparing samples at TB diagnosis to six months post-TB treatment when profiled on the bioinformatic-depleted samples compared with their kit-depleted counterparts. These results demonstrate that failure to deplete Hgb RNA prior to sequencing has a negative impact on the sensitivity to detect disease-relevant gene expression changes even when bioinformatic removal is performed.
format Online
Article
Text
id pubmed-9892020
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-98920202023-02-03 Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis Sheerin, Dylan Lakay, Francisco Esmail, Hanif Kinnear, Craig Sansom, Bianca Glanzmann, Brigitte Wilkinson, Robert J. Ritchie, Matthew E. Coussens, Anna K. Sci Rep Article When profiling blood samples by RNA-sequencing (RNA-seq), RNA from haemoglobin (Hgb) can account for up to 70% of the transcriptome. Due to considerations of sequencing depth and power to detect biological variation, Hgb RNA is typically depleted prior to sequencing by hybridisation-based methods; an alternative approach is to deplete reads arising from Hgb RNA bioinformatically. In the present study, we compared the impact of these two approaches on the outcome of differential gene expression analysis performed using RNA-seq data from 58 human tuberculosis (TB) patient or contact whole blood samples–29 globin kit-depleted and 29 matched non-depleted—a subset of which were taken at TB diagnosis and at six months post-TB treatment from the same patient. Bioinformatic depletion of Hgb genes from the non-depleted samples (bioinformatic-depleted) substantially reduced library sizes (median = 57.24%) and fewer long non-coding, micro, small nuclear and small nucleolar RNAs were captured in these libraries. Profiling published TB gene signatures across all samples revealed inferior correlation between kit-depleted and bioinformatic-depleted pairs when the proportion of reads mapping to Hgb genes was higher in the non-depleted sample, particularly at the TB diagnosis time point. A set of putative “globin-fingerprint” genes were identified by directly comparing kit-depleted and bioinformatic-depleted samples at each timepoint. Two TB treatment response signatures were also shown to have decreased differential performance when comparing samples at TB diagnosis to six months post-TB treatment when profiled on the bioinformatic-depleted samples compared with their kit-depleted counterparts. These results demonstrate that failure to deplete Hgb RNA prior to sequencing has a negative impact on the sensitivity to detect disease-relevant gene expression changes even when bioinformatic removal is performed. Nature Publishing Group UK 2023-02-01 /pmc/articles/PMC9892020/ /pubmed/36725870 http://dx.doi.org/10.1038/s41598-023-28218-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Sheerin, Dylan
Lakay, Francisco
Esmail, Hanif
Kinnear, Craig
Sansom, Bianca
Glanzmann, Brigitte
Wilkinson, Robert J.
Ritchie, Matthew E.
Coussens, Anna K.
Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_full Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_fullStr Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_full_unstemmed Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_short Identification and control for the effects of bioinformatic globin depletion on human RNA-seq differential expression analysis
title_sort identification and control for the effects of bioinformatic globin depletion on human rna-seq differential expression analysis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892020/
https://www.ncbi.nlm.nih.gov/pubmed/36725870
http://dx.doi.org/10.1038/s41598-023-28218-7
work_keys_str_mv AT sheerindylan identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT lakayfrancisco identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT esmailhanif identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT kinnearcraig identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT sansombianca identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT glanzmannbrigitte identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT wilkinsonrobertj identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT ritchiematthewe identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis
AT coussensannak identificationandcontrolfortheeffectsofbioinformaticglobindepletiononhumanrnaseqdifferentialexpressionanalysis