Cargando…
RNA-seq preprocessing and sample size considerations for gene network inference
BACKGROUND: Gene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the charac...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881880/ https://www.ncbi.nlm.nih.gov/pubmed/36711979 http://dx.doi.org/10.1101/2023.01.02.522518 |
_version_ | 1784879201070874624 |
---|---|
author | Altay, Gökmen Zapardiel-Gonzalo, Jose Peters, Bjoern |
author_facet | Altay, Gökmen Zapardiel-Gonzalo, Jose Peters, Bjoern |
author_sort | Altay, Gökmen |
collection | PubMed |
description | BACKGROUND: Gene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the characteristics of RNA-seq data are different from microarray data, it is an unanswered question what preprocessing methods for RNA-seq data should be applied prior to GNI to attain optimal performance, or what the required sample size for RNA-seq data is to obtain reliable GNI estimates. RESULTS: We ran 9144 analysis of 7 different RNA-seq datasets to evaluate 300 different preprocessing combinations that include data transformations, normalizations and association estimators. We found that there was no single best performing preprocessing combination but that there were several good ones. The performance varied widely over various datasets, which emphasized the importance of choosing an appropriate preprocessing configuration before GNI. Two preprocessing combinations appeared promising in general: First, Log-2 TPM (transcript per million) with Variance-stabilizing transformation (VST) and Pearson Correlation Coefficient (PCC) association estimator. Second, raw RNA-seq count data with PCC. Along with these two, we also identified 18 other good preprocessing combinations. Any of these algorithms might perform best in different datasets. Therefore, the GNI performances of these approaches should be measured on any new dataset to select the best performing one for it. In terms of the required biological sample size of RNA-seq data, we found that between 30 to 85 samples were required to generate reliable GNI estimates. CONCLUSIONS: This study provides practical recommendations on default choices for data preprocessing prior to GNI analysis of RNA-seq data to obtain optimal performance results. |
format | Online Article Text |
id | pubmed-9881880 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-98818802023-01-28 RNA-seq preprocessing and sample size considerations for gene network inference Altay, Gökmen Zapardiel-Gonzalo, Jose Peters, Bjoern bioRxiv Article BACKGROUND: Gene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the characteristics of RNA-seq data are different from microarray data, it is an unanswered question what preprocessing methods for RNA-seq data should be applied prior to GNI to attain optimal performance, or what the required sample size for RNA-seq data is to obtain reliable GNI estimates. RESULTS: We ran 9144 analysis of 7 different RNA-seq datasets to evaluate 300 different preprocessing combinations that include data transformations, normalizations and association estimators. We found that there was no single best performing preprocessing combination but that there were several good ones. The performance varied widely over various datasets, which emphasized the importance of choosing an appropriate preprocessing configuration before GNI. Two preprocessing combinations appeared promising in general: First, Log-2 TPM (transcript per million) with Variance-stabilizing transformation (VST) and Pearson Correlation Coefficient (PCC) association estimator. Second, raw RNA-seq count data with PCC. Along with these two, we also identified 18 other good preprocessing combinations. Any of these algorithms might perform best in different datasets. Therefore, the GNI performances of these approaches should be measured on any new dataset to select the best performing one for it. In terms of the required biological sample size of RNA-seq data, we found that between 30 to 85 samples were required to generate reliable GNI estimates. CONCLUSIONS: This study provides practical recommendations on default choices for data preprocessing prior to GNI analysis of RNA-seq data to obtain optimal performance results. Cold Spring Harbor Laboratory 2023-01-03 /pmc/articles/PMC9881880/ /pubmed/36711979 http://dx.doi.org/10.1101/2023.01.02.522518 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Altay, Gökmen Zapardiel-Gonzalo, Jose Peters, Bjoern RNA-seq preprocessing and sample size considerations for gene network inference |
title | RNA-seq preprocessing and sample size considerations for gene network inference |
title_full | RNA-seq preprocessing and sample size considerations for gene network inference |
title_fullStr | RNA-seq preprocessing and sample size considerations for gene network inference |
title_full_unstemmed | RNA-seq preprocessing and sample size considerations for gene network inference |
title_short | RNA-seq preprocessing and sample size considerations for gene network inference |
title_sort | rna-seq preprocessing and sample size considerations for gene network inference |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881880/ https://www.ncbi.nlm.nih.gov/pubmed/36711979 http://dx.doi.org/10.1101/2023.01.02.522518 |
work_keys_str_mv | AT altaygokmen rnaseqpreprocessingandsamplesizeconsiderationsforgenenetworkinference AT zapardielgonzalojose rnaseqpreprocessingandsamplesizeconsiderationsforgenenetworkinference AT petersbjoern rnaseqpreprocessingandsamplesizeconsiderationsforgenenetworkinference |