Cargando…

Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries

BACKGROUND: The availability of fast alignment-free algorithms has greatly reduced the computational burden of RNA-seq processing, especially for relatively poorly assembled genomes. Using these approaches, previous RNA-seq datasets could potentially be processed and integrated with newly sequenced...

Descripción completa

Detalles Bibliográficos
Autores principales: Bush, Stephen J., McCulloch, Mary E. B., Summers, Kim M., Hume, David A., Clark, Emily L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470212/
https://www.ncbi.nlm.nih.gov/pubmed/28610557
http://dx.doi.org/10.1186/s12859-017-1714-9
_version_ 1783243732689092608
author Bush, Stephen J.
McCulloch, Mary E. B.
Summers, Kim M.
Hume, David A.
Clark, Emily L.
author_facet Bush, Stephen J.
McCulloch, Mary E. B.
Summers, Kim M.
Hume, David A.
Clark, Emily L.
author_sort Bush, Stephen J.
collection PubMed
description BACKGROUND: The availability of fast alignment-free algorithms has greatly reduced the computational burden of RNA-seq processing, especially for relatively poorly assembled genomes. Using these approaches, previous RNA-seq datasets could potentially be processed and integrated with newly sequenced libraries. Confounding factors in such integration include sequencing depth and methods of RNA extraction and selection. Different selection methods (typically, either polyA-selection or rRNA-depletion) omit different RNAs, resulting in different fractions of the transcriptome being sequenced. In particular, rRNA-depleted libraries sample a broader fraction of the transcriptome than polyA-selected libraries. This study aimed to develop a systematic means of accounting for library type that allows data from these two methods to be compared. RESULTS: The method was developed by comparing two RNA-seq datasets from ovine macrophages, identical except for RNA selection method. Gene-level expression estimates were obtained using a two-part process centred on the high-speed transcript quantification tool Kallisto. Firstly, a set of reference transcripts was defined that constitute a standardised RNA space, with expression from both datasets quantified against it. Secondly, a simple ratio-based correction was applied to the rRNA-depleted estimates. The outcome is an almost perfect correlation between gene expression estimates, independent of library type and across the full range of levels of expression. CONCLUSION: A combination of reference transcriptome filtering and a ratio-based correction can create equivalent expression profiles from both polyA-selected and rRNA-depleted libraries. This approach will allow meta-analysis and integration of existing RNA-seq data into transcriptional atlas projects. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1714-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5470212
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54702122017-06-19 Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries Bush, Stephen J. McCulloch, Mary E. B. Summers, Kim M. Hume, David A. Clark, Emily L. BMC Bioinformatics Methodology Article BACKGROUND: The availability of fast alignment-free algorithms has greatly reduced the computational burden of RNA-seq processing, especially for relatively poorly assembled genomes. Using these approaches, previous RNA-seq datasets could potentially be processed and integrated with newly sequenced libraries. Confounding factors in such integration include sequencing depth and methods of RNA extraction and selection. Different selection methods (typically, either polyA-selection or rRNA-depletion) omit different RNAs, resulting in different fractions of the transcriptome being sequenced. In particular, rRNA-depleted libraries sample a broader fraction of the transcriptome than polyA-selected libraries. This study aimed to develop a systematic means of accounting for library type that allows data from these two methods to be compared. RESULTS: The method was developed by comparing two RNA-seq datasets from ovine macrophages, identical except for RNA selection method. Gene-level expression estimates were obtained using a two-part process centred on the high-speed transcript quantification tool Kallisto. Firstly, a set of reference transcripts was defined that constitute a standardised RNA space, with expression from both datasets quantified against it. Secondly, a simple ratio-based correction was applied to the rRNA-depleted estimates. The outcome is an almost perfect correlation between gene expression estimates, independent of library type and across the full range of levels of expression. CONCLUSION: A combination of reference transcriptome filtering and a ratio-based correction can create equivalent expression profiles from both polyA-selected and rRNA-depleted libraries. This approach will allow meta-analysis and integration of existing RNA-seq data into transcriptional atlas projects. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1714-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-13 /pmc/articles/PMC5470212/ /pubmed/28610557 http://dx.doi.org/10.1186/s12859-017-1714-9 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Bush, Stephen J.
McCulloch, Mary E. B.
Summers, Kim M.
Hume, David A.
Clark, Emily L.
Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries
title Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries
title_full Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries
title_fullStr Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries
title_full_unstemmed Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries
title_short Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries
title_sort integration of quantitated expression estimates from polya-selected and rrna-depleted rna-seq libraries
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470212/
https://www.ncbi.nlm.nih.gov/pubmed/28610557
http://dx.doi.org/10.1186/s12859-017-1714-9
work_keys_str_mv AT bushstephenj integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries
AT mccullochmaryeb integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries
AT summerskimm integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries
AT humedavida integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries
AT clarkemilyl integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries