Cargando…

Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion

BACKGROUND: Technological advances involving RNA-Seq and Bioinformatics allow quantifying the transcriptional levels of genes in cells, tissues, and cell lines, permitting the identification of Differentially Expressed Genes (DEGs). DESeq2 and edgeR are well-established computational tools used for...

Descripción completa

Detalles Bibliográficos
Autores principales: Terra Machado, Douglas, Bernardes Brustolini, Otávio José, Côrtes Martins, Yasmmin, Grivet Mattoso Maia, Marco Antonio, Ribeiro de Vasconcelos, Ana Tereza
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10078460/
https://www.ncbi.nlm.nih.gov/pubmed/37033732
http://dx.doi.org/10.7717/peerj.15145
_version_ 1785020524409126912
author Terra Machado, Douglas
Bernardes Brustolini, Otávio José
Côrtes Martins, Yasmmin
Grivet Mattoso Maia, Marco Antonio
Ribeiro de Vasconcelos, Ana Tereza
author_facet Terra Machado, Douglas
Bernardes Brustolini, Otávio José
Côrtes Martins, Yasmmin
Grivet Mattoso Maia, Marco Antonio
Ribeiro de Vasconcelos, Ana Tereza
author_sort Terra Machado, Douglas
collection PubMed
description BACKGROUND: Technological advances involving RNA-Seq and Bioinformatics allow quantifying the transcriptional levels of genes in cells, tissues, and cell lines, permitting the identification of Differentially Expressed Genes (DEGs). DESeq2 and edgeR are well-established computational tools used for this purpose and they are based upon generalized linear models (GLMs) that consider only fixed effects in modeling. However, the inclusion of random effects reduces the risk of missing potential DEGs that may be essential in the context of the biological phenomenon under investigation. The generalized linear mixed models (GLMM) can be used to include both effects. METHODS: We present DEGRE (Differentially Expressed Genes with Random Effects), a user-friendly tool capable of inferring DEGs where fixed and random effects on individuals are considered in the experimental design of RNA-Seq research. DEGRE preprocesses the raw matrices before fitting GLMMs on the genes and the derived regression coefficients are analyzed using the Wald statistical test. DEGRE offers the Benjamini-Hochberg or Bonferroni techniques for P-value adjustment. RESULTS: The datasets used for DEGRE assessment were simulated with known identification of DEGs. These have fixed effects, and the random effects were estimated and inserted to measure the impact of experimental designs with high biological variability. For DEGs’ inference, preprocessing effectively prepares the data and retains overdispersed genes. The biological coefficient of variation is inferred from the counting matrices to assess variability before and after the preprocessing. The DEGRE is computationally validated through its performance by the simulation of counting matrices, which have biological variability related to fixed and random effects. DEGRE also provides improved assessment measures for detecting DEGs in cases with higher biological variability. We show that the preprocessing established here effectively removes technical variation from those matrices. This tool also detects new potential candidate DEGs in the transcriptome data of patients with bipolar disorder, presenting a promising tool to detect more relevant genes. CONCLUSIONS: DEGRE provides data preprocessing and applies GLMMs for DEGs’ inference. The preprocessing allows efficient remotion of genes that could impact the inference. Also, the computational and biological validation of DEGRE has shown to be promising in identifying possible DEGs in experiments derived from complex experimental designs. This tool may help handle random effects on individuals in the inference of DEGs and presents a potential for discovering new interesting DEGs for further biological investigation.
format Online
Article
Text
id pubmed-10078460
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-100784602023-04-07 Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion Terra Machado, Douglas Bernardes Brustolini, Otávio José Côrtes Martins, Yasmmin Grivet Mattoso Maia, Marco Antonio Ribeiro de Vasconcelos, Ana Tereza PeerJ Bioinformatics BACKGROUND: Technological advances involving RNA-Seq and Bioinformatics allow quantifying the transcriptional levels of genes in cells, tissues, and cell lines, permitting the identification of Differentially Expressed Genes (DEGs). DESeq2 and edgeR are well-established computational tools used for this purpose and they are based upon generalized linear models (GLMs) that consider only fixed effects in modeling. However, the inclusion of random effects reduces the risk of missing potential DEGs that may be essential in the context of the biological phenomenon under investigation. The generalized linear mixed models (GLMM) can be used to include both effects. METHODS: We present DEGRE (Differentially Expressed Genes with Random Effects), a user-friendly tool capable of inferring DEGs where fixed and random effects on individuals are considered in the experimental design of RNA-Seq research. DEGRE preprocesses the raw matrices before fitting GLMMs on the genes and the derived regression coefficients are analyzed using the Wald statistical test. DEGRE offers the Benjamini-Hochberg or Bonferroni techniques for P-value adjustment. RESULTS: The datasets used for DEGRE assessment were simulated with known identification of DEGs. These have fixed effects, and the random effects were estimated and inserted to measure the impact of experimental designs with high biological variability. For DEGs’ inference, preprocessing effectively prepares the data and retains overdispersed genes. The biological coefficient of variation is inferred from the counting matrices to assess variability before and after the preprocessing. The DEGRE is computationally validated through its performance by the simulation of counting matrices, which have biological variability related to fixed and random effects. DEGRE also provides improved assessment measures for detecting DEGs in cases with higher biological variability. We show that the preprocessing established here effectively removes technical variation from those matrices. This tool also detects new potential candidate DEGs in the transcriptome data of patients with bipolar disorder, presenting a promising tool to detect more relevant genes. CONCLUSIONS: DEGRE provides data preprocessing and applies GLMMs for DEGs’ inference. The preprocessing allows efficient remotion of genes that could impact the inference. Also, the computational and biological validation of DEGRE has shown to be promising in identifying possible DEGs in experiments derived from complex experimental designs. This tool may help handle random effects on individuals in the inference of DEGs and presents a potential for discovering new interesting DEGs for further biological investigation. PeerJ Inc. 2023-04-03 /pmc/articles/PMC10078460/ /pubmed/37033732 http://dx.doi.org/10.7717/peerj.15145 Text en © 2023 Terra Machado et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Terra Machado, Douglas
Bernardes Brustolini, Otávio José
Côrtes Martins, Yasmmin
Grivet Mattoso Maia, Marco Antonio
Ribeiro de Vasconcelos, Ana Tereza
Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion
title Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion
title_full Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion
title_fullStr Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion
title_full_unstemmed Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion
title_short Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion
title_sort inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10078460/
https://www.ncbi.nlm.nih.gov/pubmed/37033732
http://dx.doi.org/10.7717/peerj.15145
work_keys_str_mv AT terramachadodouglas inferenceofdifferentiallyexpressedgenesusinggeneralizedlinearmixedmodelsinapairwisefashion
AT bernardesbrustoliniotaviojose inferenceofdifferentiallyexpressedgenesusinggeneralizedlinearmixedmodelsinapairwisefashion
AT cortesmartinsyasmmin inferenceofdifferentiallyexpressedgenesusinggeneralizedlinearmixedmodelsinapairwisefashion
AT grivetmattosomaiamarcoantonio inferenceofdifferentiallyexpressedgenesusinggeneralizedlinearmixedmodelsinapairwisefashion
AT ribeirodevasconcelosanatereza inferenceofdifferentiallyexpressedgenesusinggeneralizedlinearmixedmodelsinapairwisefashion