Cargando…

How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets

The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or ‘...

Descripción completa

Detalles Bibliográficos
Autores principales: Peixoto, Lucia, Risso, Davide, Poplawski, Shane G., Wimmer, Mathieu E., Speed, Terence P., Wood, Marcelo A., Abel, Ted
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652761/
https://www.ncbi.nlm.nih.gov/pubmed/26202970
http://dx.doi.org/10.1093/nar/gkv736
_version_ 1782401815258595328
author Peixoto, Lucia
Risso, Davide
Poplawski, Shane G.
Wimmer, Mathieu E.
Speed, Terence P.
Wood, Marcelo A.
Abel, Ted
author_facet Peixoto, Lucia
Risso, Davide
Poplawski, Shane G.
Wimmer, Mathieu E.
Speed, Terence P.
Wood, Marcelo A.
Abel, Ted
author_sort Peixoto, Lucia
collection PubMed
description The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or ‘batch effects’ can contribute unwanted variation to the data, commonly used RNA-seq normalization methods only correct for sequencing depth. The study of gene expression is particularly problematic when it is influenced simultaneously by a variety of biological factors in addition to the one of interest. Using examples from experimental neuroscience, we show that batch effects can dominate the signal of interest; and that the choice of normalization method affects the power and reproducibility of the results. While commonly used global normalization methods are not able to adequately normalize the data, more recently developed RNA-seq normalization can. We focus on one particular method, RUVSeq and show that it is able to increase power and biological insight of the results. Finally, we provide a tutorial outlining the implementation of RUVSeq normalization that is applicable to a broad range of studies as well as meta-analysis of publicly available data.
format Online
Article
Text
id pubmed-4652761
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46527612015-11-25 How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets Peixoto, Lucia Risso, Davide Poplawski, Shane G. Wimmer, Mathieu E. Speed, Terence P. Wood, Marcelo A. Abel, Ted Nucleic Acids Res Survey and Summary The sequencing of the full transcriptome (RNA-seq) has become the preferred choice for the measurement of genome-wide gene expression. Despite its widespread use, challenges remain in RNA-seq data analysis. One often-overlooked aspect is normalization. Despite the fact that a variety of factors or ‘batch effects’ can contribute unwanted variation to the data, commonly used RNA-seq normalization methods only correct for sequencing depth. The study of gene expression is particularly problematic when it is influenced simultaneously by a variety of biological factors in addition to the one of interest. Using examples from experimental neuroscience, we show that batch effects can dominate the signal of interest; and that the choice of normalization method affects the power and reproducibility of the results. While commonly used global normalization methods are not able to adequately normalize the data, more recently developed RNA-seq normalization can. We focus on one particular method, RUVSeq and show that it is able to increase power and biological insight of the results. Finally, we provide a tutorial outlining the implementation of RUVSeq normalization that is applicable to a broad range of studies as well as meta-analysis of publicly available data. Oxford University Press 2015-09-18 2015-07-21 /pmc/articles/PMC4652761/ /pubmed/26202970 http://dx.doi.org/10.1093/nar/gkv736 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Survey and Summary
Peixoto, Lucia
Risso, Davide
Poplawski, Shane G.
Wimmer, Mathieu E.
Speed, Terence P.
Wood, Marcelo A.
Abel, Ted
How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
title How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
title_full How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
title_fullStr How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
title_full_unstemmed How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
title_short How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets
title_sort how data analysis affects power, reproducibility and biological insight of rna-seq studies in complex datasets
topic Survey and Summary
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652761/
https://www.ncbi.nlm.nih.gov/pubmed/26202970
http://dx.doi.org/10.1093/nar/gkv736
work_keys_str_mv AT peixotolucia howdataanalysisaffectspowerreproducibilityandbiologicalinsightofrnaseqstudiesincomplexdatasets
AT rissodavide howdataanalysisaffectspowerreproducibilityandbiologicalinsightofrnaseqstudiesincomplexdatasets
AT poplawskishaneg howdataanalysisaffectspowerreproducibilityandbiologicalinsightofrnaseqstudiesincomplexdatasets
AT wimmermathieue howdataanalysisaffectspowerreproducibilityandbiologicalinsightofrnaseqstudiesincomplexdatasets
AT speedterencep howdataanalysisaffectspowerreproducibilityandbiologicalinsightofrnaseqstudiesincomplexdatasets
AT woodmarceloa howdataanalysisaffectspowerreproducibilityandbiologicalinsightofrnaseqstudiesincomplexdatasets
AT abelted howdataanalysisaffectspowerreproducibilityandbiologicalinsightofrnaseqstudiesincomplexdatasets