Cargando…

A guide to creating design matrices for gene expression experiments

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipeline...

Descripción completa

Detalles Bibliográficos
Autores principales: Law, Charity W., Zeglinski, Kathleen, Dong, Xueyi, Alhamdoosh, Monther, Smyth, Gordon K., Ritchie, Matthew E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7873980/
https://www.ncbi.nlm.nih.gov/pubmed/33604029
http://dx.doi.org/10.12688/f1000research.27893.1
_version_ 1783649491190022144
author Law, Charity W.
Zeglinski, Kathleen
Dong, Xueyi
Alhamdoosh, Monther
Smyth, Gordon K.
Ritchie, Matthew E.
author_facet Law, Charity W.
Zeglinski, Kathleen
Dong, Xueyi
Alhamdoosh, Monther
Smyth, Gordon K.
Ritchie, Matthew E.
author_sort Law, Charity W.
collection PubMed
description Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.
format Online
Article
Text
id pubmed-7873980
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-78739802021-02-17 A guide to creating design matrices for gene expression experiments Law, Charity W. Zeglinski, Kathleen Dong, Xueyi Alhamdoosh, Monther Smyth, Gordon K. Ritchie, Matthew E. F1000Res Method Article Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis. F1000 Research Limited 2020-12-10 /pmc/articles/PMC7873980/ /pubmed/33604029 http://dx.doi.org/10.12688/f1000research.27893.1 Text en Copyright: © 2020 Law CW et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Method Article
Law, Charity W.
Zeglinski, Kathleen
Dong, Xueyi
Alhamdoosh, Monther
Smyth, Gordon K.
Ritchie, Matthew E.
A guide to creating design matrices for gene expression experiments
title A guide to creating design matrices for gene expression experiments
title_full A guide to creating design matrices for gene expression experiments
title_fullStr A guide to creating design matrices for gene expression experiments
title_full_unstemmed A guide to creating design matrices for gene expression experiments
title_short A guide to creating design matrices for gene expression experiments
title_sort guide to creating design matrices for gene expression experiments
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7873980/
https://www.ncbi.nlm.nih.gov/pubmed/33604029
http://dx.doi.org/10.12688/f1000research.27893.1
work_keys_str_mv AT lawcharityw aguidetocreatingdesignmatricesforgeneexpressionexperiments
AT zeglinskikathleen aguidetocreatingdesignmatricesforgeneexpressionexperiments
AT dongxueyi aguidetocreatingdesignmatricesforgeneexpressionexperiments
AT alhamdooshmonther aguidetocreatingdesignmatricesforgeneexpressionexperiments
AT smythgordonk aguidetocreatingdesignmatricesforgeneexpressionexperiments
AT ritchiematthewe aguidetocreatingdesignmatricesforgeneexpressionexperiments
AT lawcharityw guidetocreatingdesignmatricesforgeneexpressionexperiments
AT zeglinskikathleen guidetocreatingdesignmatricesforgeneexpressionexperiments
AT dongxueyi guidetocreatingdesignmatricesforgeneexpressionexperiments
AT alhamdooshmonther guidetocreatingdesignmatricesforgeneexpressionexperiments
AT smythgordonk guidetocreatingdesignmatricesforgeneexpressionexperiments
AT ritchiematthewe guidetocreatingdesignmatricesforgeneexpressionexperiments