Cargando…
Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries
BACKGROUND: The availability of fast alignment-free algorithms has greatly reduced the computational burden of RNA-seq processing, especially for relatively poorly assembled genomes. Using these approaches, previous RNA-seq datasets could potentially be processed and integrated with newly sequenced...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470212/ https://www.ncbi.nlm.nih.gov/pubmed/28610557 http://dx.doi.org/10.1186/s12859-017-1714-9 |
_version_ | 1783243732689092608 |
---|---|
author | Bush, Stephen J. McCulloch, Mary E. B. Summers, Kim M. Hume, David A. Clark, Emily L. |
author_facet | Bush, Stephen J. McCulloch, Mary E. B. Summers, Kim M. Hume, David A. Clark, Emily L. |
author_sort | Bush, Stephen J. |
collection | PubMed |
description | BACKGROUND: The availability of fast alignment-free algorithms has greatly reduced the computational burden of RNA-seq processing, especially for relatively poorly assembled genomes. Using these approaches, previous RNA-seq datasets could potentially be processed and integrated with newly sequenced libraries. Confounding factors in such integration include sequencing depth and methods of RNA extraction and selection. Different selection methods (typically, either polyA-selection or rRNA-depletion) omit different RNAs, resulting in different fractions of the transcriptome being sequenced. In particular, rRNA-depleted libraries sample a broader fraction of the transcriptome than polyA-selected libraries. This study aimed to develop a systematic means of accounting for library type that allows data from these two methods to be compared. RESULTS: The method was developed by comparing two RNA-seq datasets from ovine macrophages, identical except for RNA selection method. Gene-level expression estimates were obtained using a two-part process centred on the high-speed transcript quantification tool Kallisto. Firstly, a set of reference transcripts was defined that constitute a standardised RNA space, with expression from both datasets quantified against it. Secondly, a simple ratio-based correction was applied to the rRNA-depleted estimates. The outcome is an almost perfect correlation between gene expression estimates, independent of library type and across the full range of levels of expression. CONCLUSION: A combination of reference transcriptome filtering and a ratio-based correction can create equivalent expression profiles from both polyA-selected and rRNA-depleted libraries. This approach will allow meta-analysis and integration of existing RNA-seq data into transcriptional atlas projects. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1714-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5470212 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54702122017-06-19 Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries Bush, Stephen J. McCulloch, Mary E. B. Summers, Kim M. Hume, David A. Clark, Emily L. BMC Bioinformatics Methodology Article BACKGROUND: The availability of fast alignment-free algorithms has greatly reduced the computational burden of RNA-seq processing, especially for relatively poorly assembled genomes. Using these approaches, previous RNA-seq datasets could potentially be processed and integrated with newly sequenced libraries. Confounding factors in such integration include sequencing depth and methods of RNA extraction and selection. Different selection methods (typically, either polyA-selection or rRNA-depletion) omit different RNAs, resulting in different fractions of the transcriptome being sequenced. In particular, rRNA-depleted libraries sample a broader fraction of the transcriptome than polyA-selected libraries. This study aimed to develop a systematic means of accounting for library type that allows data from these two methods to be compared. RESULTS: The method was developed by comparing two RNA-seq datasets from ovine macrophages, identical except for RNA selection method. Gene-level expression estimates were obtained using a two-part process centred on the high-speed transcript quantification tool Kallisto. Firstly, a set of reference transcripts was defined that constitute a standardised RNA space, with expression from both datasets quantified against it. Secondly, a simple ratio-based correction was applied to the rRNA-depleted estimates. The outcome is an almost perfect correlation between gene expression estimates, independent of library type and across the full range of levels of expression. CONCLUSION: A combination of reference transcriptome filtering and a ratio-based correction can create equivalent expression profiles from both polyA-selected and rRNA-depleted libraries. This approach will allow meta-analysis and integration of existing RNA-seq data into transcriptional atlas projects. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1714-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-13 /pmc/articles/PMC5470212/ /pubmed/28610557 http://dx.doi.org/10.1186/s12859-017-1714-9 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Bush, Stephen J. McCulloch, Mary E. B. Summers, Kim M. Hume, David A. Clark, Emily L. Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries |
title | Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries |
title_full | Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries |
title_fullStr | Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries |
title_full_unstemmed | Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries |
title_short | Integration of quantitated expression estimates from polyA-selected and rRNA-depleted RNA-seq libraries |
title_sort | integration of quantitated expression estimates from polya-selected and rrna-depleted rna-seq libraries |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5470212/ https://www.ncbi.nlm.nih.gov/pubmed/28610557 http://dx.doi.org/10.1186/s12859-017-1714-9 |
work_keys_str_mv | AT bushstephenj integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries AT mccullochmaryeb integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries AT summerskimm integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries AT humedavida integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries AT clarkemilyl integrationofquantitatedexpressionestimatesfrompolyaselectedandrrnadepletedrnaseqlibraries |