Cargando…

RNA-seq preprocessing and sample size considerations for gene network inference

BACKGROUND: Gene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the charac...

Descripción completa

Detalles Bibliográficos
Autores principales: Altay, Gökmen, Zapardiel-Gonzalo, Jose, Peters, Bjoern
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881880/
https://www.ncbi.nlm.nih.gov/pubmed/36711979
http://dx.doi.org/10.1101/2023.01.02.522518
_version_ 1784879201070874624
author Altay, Gökmen
Zapardiel-Gonzalo, Jose
Peters, Bjoern
author_facet Altay, Gökmen
Zapardiel-Gonzalo, Jose
Peters, Bjoern
author_sort Altay, Gökmen
collection PubMed
description BACKGROUND: Gene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the characteristics of RNA-seq data are different from microarray data, it is an unanswered question what preprocessing methods for RNA-seq data should be applied prior to GNI to attain optimal performance, or what the required sample size for RNA-seq data is to obtain reliable GNI estimates. RESULTS: We ran 9144 analysis of 7 different RNA-seq datasets to evaluate 300 different preprocessing combinations that include data transformations, normalizations and association estimators. We found that there was no single best performing preprocessing combination but that there were several good ones. The performance varied widely over various datasets, which emphasized the importance of choosing an appropriate preprocessing configuration before GNI. Two preprocessing combinations appeared promising in general: First, Log-2 TPM (transcript per million) with Variance-stabilizing transformation (VST) and Pearson Correlation Coefficient (PCC) association estimator. Second, raw RNA-seq count data with PCC. Along with these two, we also identified 18 other good preprocessing combinations. Any of these algorithms might perform best in different datasets. Therefore, the GNI performances of these approaches should be measured on any new dataset to select the best performing one for it. In terms of the required biological sample size of RNA-seq data, we found that between 30 to 85 samples were required to generate reliable GNI estimates. CONCLUSIONS: This study provides practical recommendations on default choices for data preprocessing prior to GNI analysis of RNA-seq data to obtain optimal performance results.
format Online
Article
Text
id pubmed-9881880
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-98818802023-01-28 RNA-seq preprocessing and sample size considerations for gene network inference Altay, Gökmen Zapardiel-Gonzalo, Jose Peters, Bjoern bioRxiv Article BACKGROUND: Gene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the characteristics of RNA-seq data are different from microarray data, it is an unanswered question what preprocessing methods for RNA-seq data should be applied prior to GNI to attain optimal performance, or what the required sample size for RNA-seq data is to obtain reliable GNI estimates. RESULTS: We ran 9144 analysis of 7 different RNA-seq datasets to evaluate 300 different preprocessing combinations that include data transformations, normalizations and association estimators. We found that there was no single best performing preprocessing combination but that there were several good ones. The performance varied widely over various datasets, which emphasized the importance of choosing an appropriate preprocessing configuration before GNI. Two preprocessing combinations appeared promising in general: First, Log-2 TPM (transcript per million) with Variance-stabilizing transformation (VST) and Pearson Correlation Coefficient (PCC) association estimator. Second, raw RNA-seq count data with PCC. Along with these two, we also identified 18 other good preprocessing combinations. Any of these algorithms might perform best in different datasets. Therefore, the GNI performances of these approaches should be measured on any new dataset to select the best performing one for it. In terms of the required biological sample size of RNA-seq data, we found that between 30 to 85 samples were required to generate reliable GNI estimates. CONCLUSIONS: This study provides practical recommendations on default choices for data preprocessing prior to GNI analysis of RNA-seq data to obtain optimal performance results. Cold Spring Harbor Laboratory 2023-01-03 /pmc/articles/PMC9881880/ /pubmed/36711979 http://dx.doi.org/10.1101/2023.01.02.522518 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Altay, Gökmen
Zapardiel-Gonzalo, Jose
Peters, Bjoern
RNA-seq preprocessing and sample size considerations for gene network inference
title RNA-seq preprocessing and sample size considerations for gene network inference
title_full RNA-seq preprocessing and sample size considerations for gene network inference
title_fullStr RNA-seq preprocessing and sample size considerations for gene network inference
title_full_unstemmed RNA-seq preprocessing and sample size considerations for gene network inference
title_short RNA-seq preprocessing and sample size considerations for gene network inference
title_sort rna-seq preprocessing and sample size considerations for gene network inference
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881880/
https://www.ncbi.nlm.nih.gov/pubmed/36711979
http://dx.doi.org/10.1101/2023.01.02.522518
work_keys_str_mv AT altaygokmen rnaseqpreprocessingandsamplesizeconsiderationsforgenenetworkinference
AT zapardielgonzalojose rnaseqpreprocessingandsamplesizeconsiderationsforgenenetworkinference
AT petersbjoern rnaseqpreprocessingandsamplesizeconsiderationsforgenenetworkinference