Cargando…

Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data

MOTIVATION: Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples. Coordinated expression of genes may indicate that they are controlled by the same transcriptional regulatory program, or involved in common biol...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsieh, Ping-Han, Lopes-Ramos, Camila Miranda, Zucknick, Manuela, Sandve, Geir Kjetil, Glass, Kimberly, Kuijjer, Marieke Lydia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10598588/
https://www.ncbi.nlm.nih.gov/pubmed/37802917
http://dx.doi.org/10.1093/bioinformatics/btad610
_version_ 1785125588035436544
author Hsieh, Ping-Han
Lopes-Ramos, Camila Miranda
Zucknick, Manuela
Sandve, Geir Kjetil
Glass, Kimberly
Kuijjer, Marieke Lydia
author_facet Hsieh, Ping-Han
Lopes-Ramos, Camila Miranda
Zucknick, Manuela
Sandve, Geir Kjetil
Glass, Kimberly
Kuijjer, Marieke Lydia
author_sort Hsieh, Ping-Han
collection PubMed
description MOTIVATION: Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples. Coordinated expression of genes may indicate that they are controlled by the same transcriptional regulatory program, or involved in common biological processes. Gene co-expression is generally estimated from RNA-Sequencing data, which are commonly normalized to remove technical variability. Here, we demonstrate that certain normalization methods, in particular quantile-based methods, can introduce false-positive associations between genes. These false-positive associations can consequently hamper downstream co-expression network analysis. Quantile-based normalization can, however, be extremely powerful. In particular, when preprocessing large-scale heterogeneous data, quantile-based normalization methods such as smooth quantile normalization can be applied to remove technical variability while maintaining global differences in expression for samples with different biological attributes. RESULTS: We developed SNAIL (Smooth-quantile Normalization Adaptation for the Inference of co-expression Links), a normalization method based on smooth quantile normalization specifically designed for modeling of co-expression measurements. We show that SNAIL avoids formation of false-positive associations in co-expression as well as in downstream network analyses. Using SNAIL, one can avoid arbitrary gene filtering and retain associations to genes that only express in small subgroups of samples. This highlights the method’s potential future impact on network modeling and other association-based approaches in large-scale heterogeneous data. AVAILABILITY AND IMPLEMENTATION: The implementation of the SNAIL algorithm and code to reproduce the analyses described in this work can be found in the GitHub repository https://github.com/kuijjerlab/PySNAIL.
format Online
Article
Text
id pubmed-10598588
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105985882023-10-26 Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data Hsieh, Ping-Han Lopes-Ramos, Camila Miranda Zucknick, Manuela Sandve, Geir Kjetil Glass, Kimberly Kuijjer, Marieke Lydia Bioinformatics Original Paper MOTIVATION: Gene co-expression measurements are widely used in computational biology to identify coordinated expression patterns across a group of samples. Coordinated expression of genes may indicate that they are controlled by the same transcriptional regulatory program, or involved in common biological processes. Gene co-expression is generally estimated from RNA-Sequencing data, which are commonly normalized to remove technical variability. Here, we demonstrate that certain normalization methods, in particular quantile-based methods, can introduce false-positive associations between genes. These false-positive associations can consequently hamper downstream co-expression network analysis. Quantile-based normalization can, however, be extremely powerful. In particular, when preprocessing large-scale heterogeneous data, quantile-based normalization methods such as smooth quantile normalization can be applied to remove technical variability while maintaining global differences in expression for samples with different biological attributes. RESULTS: We developed SNAIL (Smooth-quantile Normalization Adaptation for the Inference of co-expression Links), a normalization method based on smooth quantile normalization specifically designed for modeling of co-expression measurements. We show that SNAIL avoids formation of false-positive associations in co-expression as well as in downstream network analyses. Using SNAIL, one can avoid arbitrary gene filtering and retain associations to genes that only express in small subgroups of samples. This highlights the method’s potential future impact on network modeling and other association-based approaches in large-scale heterogeneous data. AVAILABILITY AND IMPLEMENTATION: The implementation of the SNAIL algorithm and code to reproduce the analyses described in this work can be found in the GitHub repository https://github.com/kuijjerlab/PySNAIL. Oxford University Press 2023-10-06 /pmc/articles/PMC10598588/ /pubmed/37802917 http://dx.doi.org/10.1093/bioinformatics/btad610 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Hsieh, Ping-Han
Lopes-Ramos, Camila Miranda
Zucknick, Manuela
Sandve, Geir Kjetil
Glass, Kimberly
Kuijjer, Marieke Lydia
Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data
title Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data
title_full Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data
title_fullStr Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data
title_full_unstemmed Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data
title_short Adjustment of spurious correlations in co-expression measurements from RNA-Sequencing data
title_sort adjustment of spurious correlations in co-expression measurements from rna-sequencing data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10598588/
https://www.ncbi.nlm.nih.gov/pubmed/37802917
http://dx.doi.org/10.1093/bioinformatics/btad610
work_keys_str_mv AT hsiehpinghan adjustmentofspuriouscorrelationsincoexpressionmeasurementsfromrnasequencingdata
AT lopesramoscamilamiranda adjustmentofspuriouscorrelationsincoexpressionmeasurementsfromrnasequencingdata
AT zucknickmanuela adjustmentofspuriouscorrelationsincoexpressionmeasurementsfromrnasequencingdata
AT sandvegeirkjetil adjustmentofspuriouscorrelationsincoexpressionmeasurementsfromrnasequencingdata
AT glasskimberly adjustmentofspuriouscorrelationsincoexpressionmeasurementsfromrnasequencingdata
AT kuijjermariekelydia adjustmentofspuriouscorrelationsincoexpressionmeasurementsfromrnasequencingdata