Cargando…

Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Ruijie, Holik, Aliaksei Z., Su, Shian, Jansz, Natasha, Chen, Kelan, Leong, Huei San, Blewitt, Marnie E., Asselin-Labat, Marie-Liesse, Smyth, Gordon K., Ritchie, Matthew E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551905/
https://www.ncbi.nlm.nih.gov/pubmed/25925576
http://dx.doi.org/10.1093/nar/gkv412
_version_ 1782387643240153088
author Liu, Ruijie
Holik, Aliaksei Z.
Su, Shian
Jansz, Natasha
Chen, Kelan
Leong, Huei San
Blewitt, Marnie E.
Asselin-Labat, Marie-Liesse
Smyth, Gordon K.
Ritchie, Matthew E.
author_facet Liu, Ruijie
Holik, Aliaksei Z.
Su, Shian
Jansz, Natasha
Chen, Kelan
Leong, Huei San
Blewitt, Marnie E.
Asselin-Labat, Marie-Liesse
Smyth, Gordon K.
Ritchie, Matthew E.
author_sort Liu, Ruijie
collection PubMed
description Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package.
format Online
Article
Text
id pubmed-4551905
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-45519052015-08-28 Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses Liu, Ruijie Holik, Aliaksei Z. Su, Shian Jansz, Natasha Chen, Kelan Leong, Huei San Blewitt, Marnie E. Asselin-Labat, Marie-Liesse Smyth, Gordon K. Ritchie, Matthew E. Nucleic Acids Res Methods Online Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package. Oxford University Press 2015-09-03 2015-04-29 /pmc/articles/PMC4551905/ /pubmed/25925576 http://dx.doi.org/10.1093/nar/gkv412 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Liu, Ruijie
Holik, Aliaksei Z.
Su, Shian
Jansz, Natasha
Chen, Kelan
Leong, Huei San
Blewitt, Marnie E.
Asselin-Labat, Marie-Liesse
Smyth, Gordon K.
Ritchie, Matthew E.
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
title Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
title_full Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
title_fullStr Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
title_full_unstemmed Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
title_short Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
title_sort why weight? modelling sample and observational level variability improves power in rna-seq analyses
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4551905/
https://www.ncbi.nlm.nih.gov/pubmed/25925576
http://dx.doi.org/10.1093/nar/gkv412
work_keys_str_mv AT liuruijie whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT holikaliakseiz whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT sushian whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT jansznatasha whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT chenkelan whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT leonghueisan whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT blewittmarniee whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT asselinlabatmarieliesse whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT smythgordonk whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses
AT ritchiematthewe whyweightmodellingsampleandobservationallevelvariabilityimprovespowerinrnaseqanalyses