Cargando…

Assessing the consistency of public human tissue RNA-seq data sets

Sequencing-based gene expression methods like RNA-sequencing (RNA-seq) have become increasingly common, but it is often claimed that results obtained in different studies are not comparable owing to the influence of laboratory batch effects, differences in RNA extraction and sequencing library prepa...

Descripción completa

Detalles Bibliográficos
Autores principales: Danielsson, Frida, James, Tojo, Gomez-Cabrero, David, Huss, Mikael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652619/
https://www.ncbi.nlm.nih.gov/pubmed/25829468
http://dx.doi.org/10.1093/bib/bbv017
_version_ 1782401787567800320
author Danielsson, Frida
James, Tojo
Gomez-Cabrero, David
Huss, Mikael
author_facet Danielsson, Frida
James, Tojo
Gomez-Cabrero, David
Huss, Mikael
author_sort Danielsson, Frida
collection PubMed
description Sequencing-based gene expression methods like RNA-sequencing (RNA-seq) have become increasingly common, but it is often claimed that results obtained in different studies are not comparable owing to the influence of laboratory batch effects, differences in RNA extraction and sequencing library preparation methods and bioinformatics processing pipelines. It would be unfortunate if different experiments were in fact incomparable, as there is great promise in data fusion and meta-analysis applied to sequencing data sets. We therefore compared reported gene expression measurements for ostensibly similar samples (specifically, human brain, heart and kidney samples) in several different RNA-seq studies to assess their overall consistency and to examine the factors contributing most to systematic differences. The same comparisons were also performed after preprocessing all data in a consistent way, eliminating potential bias from bioinformatics pipelines. We conclude that published human tissue RNA-seq expression measurements appear relatively consistent in the sense that samples cluster by tissue rather than laboratory of origin given simple preprocessing transformations. The article is supplemented by a detailed walkthrough with embedded R code and figures.
format Online
Article
Text
id pubmed-4652619
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46526192015-11-25 Assessing the consistency of public human tissue RNA-seq data sets Danielsson, Frida James, Tojo Gomez-Cabrero, David Huss, Mikael Brief Bioinform Papers Sequencing-based gene expression methods like RNA-sequencing (RNA-seq) have become increasingly common, but it is often claimed that results obtained in different studies are not comparable owing to the influence of laboratory batch effects, differences in RNA extraction and sequencing library preparation methods and bioinformatics processing pipelines. It would be unfortunate if different experiments were in fact incomparable, as there is great promise in data fusion and meta-analysis applied to sequencing data sets. We therefore compared reported gene expression measurements for ostensibly similar samples (specifically, human brain, heart and kidney samples) in several different RNA-seq studies to assess their overall consistency and to examine the factors contributing most to systematic differences. The same comparisons were also performed after preprocessing all data in a consistent way, eliminating potential bias from bioinformatics pipelines. We conclude that published human tissue RNA-seq expression measurements appear relatively consistent in the sense that samples cluster by tissue rather than laboratory of origin given simple preprocessing transformations. The article is supplemented by a detailed walkthrough with embedded R code and figures. Oxford University Press 2015-11 2015-03-30 /pmc/articles/PMC4652619/ /pubmed/25829468 http://dx.doi.org/10.1093/bib/bbv017 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Papers
Danielsson, Frida
James, Tojo
Gomez-Cabrero, David
Huss, Mikael
Assessing the consistency of public human tissue RNA-seq data sets
title Assessing the consistency of public human tissue RNA-seq data sets
title_full Assessing the consistency of public human tissue RNA-seq data sets
title_fullStr Assessing the consistency of public human tissue RNA-seq data sets
title_full_unstemmed Assessing the consistency of public human tissue RNA-seq data sets
title_short Assessing the consistency of public human tissue RNA-seq data sets
title_sort assessing the consistency of public human tissue rna-seq data sets
topic Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652619/
https://www.ncbi.nlm.nih.gov/pubmed/25829468
http://dx.doi.org/10.1093/bib/bbv017
work_keys_str_mv AT danielssonfrida assessingtheconsistencyofpublichumantissuernaseqdatasets
AT jamestojo assessingtheconsistencyofpublichumantissuernaseqdatasets
AT gomezcabrerodavid assessingtheconsistencyofpublichumantissuernaseqdatasets
AT hussmikael assessingtheconsistencyofpublichumantissuernaseqdatasets