Cargando…

Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis

MOTIVATION: Identification of system-wide causal relationships can contribute to our understanding of long-distance, intercellular signalling in biological organisms. Dynamic transcriptome analysis holds great potential to uncover coordinated biological processes between organs. However, many existi...

Descripción completa

Detalles Bibliográficos
Autores principales: Heerah, Sachin, Molinari, Roberto, Guerrier, Stéphane, Marshall-Colon, Amy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8388030/
https://www.ncbi.nlm.nih.gov/pubmed/33693548
http://dx.doi.org/10.1093/bioinformatics/btab126
_version_ 1783742562293514240
author Heerah, Sachin
Molinari, Roberto
Guerrier, Stéphane
Marshall-Colon, Amy
author_facet Heerah, Sachin
Molinari, Roberto
Guerrier, Stéphane
Marshall-Colon, Amy
author_sort Heerah, Sachin
collection PubMed
description MOTIVATION: Identification of system-wide causal relationships can contribute to our understanding of long-distance, intercellular signalling in biological organisms. Dynamic transcriptome analysis holds great potential to uncover coordinated biological processes between organs. However, many existing dynamic transcriptome studies are characterized by sparse and often unevenly spaced time points that make the identification of causal relationships across organs analytically challenging. Application of existing statistical models, designed for regular time series with abundant time points, to sparse data may fail to reveal biologically significant, causal relationships. With increasing research interest in biological time series data, there is a need for new statistical methods that are able to determine causality within and between time series data sets. Here, a statistical framework was developed to identify (Granger) causal gene-gene relationships of unevenly spaced, multivariate time series data from two different tissues of Arabidopsis thaliana in response to a nitrogen signal. RESULTS: This work delivers a statistical approach for modelling irregularly sampled bivariate signals which embeds functions from the domain of engineering that allow to adapt the model’s dependence structure to the specific sampling time. Using maximum-likelihood to estimate the parameters of this model for each bivariate time series, it is then possible to use bootstrap procedures for small samples (or asymptotics for large samples) in order to test for Granger-Causality. When applied to the A.thaliana data, the proposed approach produced 3078 significant interactions, in which 2012 interactions have root causal genes and 1066 interactions have shoot causal genes. Many of the predicted causal and target genes are known players in local and long-distance nitrogen signalling, including genes encoding transcription factors, hormones and signalling peptides. Of the 1007 total causal genes (either organ), 384 are either known or predicted mobile transcripts, suggesting that the identified causal genes may be directly involved in long-distance nitrogen signalling through intercellular interactions. The model predictions and subsequent network analysis identified nitrogen-responsive genes that can be further tested for their specific roles in long-distance nitrogen signalling. AVAILABILITY AND IMPLEMENTATION: The method was developed with the R statistical software and is made available through the R package ‘irg’ hosted on the GitHub repository https://github.com/SMAC-Group/irg where also a running example vignette can be found (https://smac-group.github.io/irg/articles/vignette.html). A few signals from the original data set are made available in the package as an example to apply the method and the complete A.thaliana data can be found at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE97500. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8388030
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83880302021-08-26 Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis Heerah, Sachin Molinari, Roberto Guerrier, Stéphane Marshall-Colon, Amy Bioinformatics Original Papers MOTIVATION: Identification of system-wide causal relationships can contribute to our understanding of long-distance, intercellular signalling in biological organisms. Dynamic transcriptome analysis holds great potential to uncover coordinated biological processes between organs. However, many existing dynamic transcriptome studies are characterized by sparse and often unevenly spaced time points that make the identification of causal relationships across organs analytically challenging. Application of existing statistical models, designed for regular time series with abundant time points, to sparse data may fail to reveal biologically significant, causal relationships. With increasing research interest in biological time series data, there is a need for new statistical methods that are able to determine causality within and between time series data sets. Here, a statistical framework was developed to identify (Granger) causal gene-gene relationships of unevenly spaced, multivariate time series data from two different tissues of Arabidopsis thaliana in response to a nitrogen signal. RESULTS: This work delivers a statistical approach for modelling irregularly sampled bivariate signals which embeds functions from the domain of engineering that allow to adapt the model’s dependence structure to the specific sampling time. Using maximum-likelihood to estimate the parameters of this model for each bivariate time series, it is then possible to use bootstrap procedures for small samples (or asymptotics for large samples) in order to test for Granger-Causality. When applied to the A.thaliana data, the proposed approach produced 3078 significant interactions, in which 2012 interactions have root causal genes and 1066 interactions have shoot causal genes. Many of the predicted causal and target genes are known players in local and long-distance nitrogen signalling, including genes encoding transcription factors, hormones and signalling peptides. Of the 1007 total causal genes (either organ), 384 are either known or predicted mobile transcripts, suggesting that the identified causal genes may be directly involved in long-distance nitrogen signalling through intercellular interactions. The model predictions and subsequent network analysis identified nitrogen-responsive genes that can be further tested for their specific roles in long-distance nitrogen signalling. AVAILABILITY AND IMPLEMENTATION: The method was developed with the R statistical software and is made available through the R package ‘irg’ hosted on the GitHub repository https://github.com/SMAC-Group/irg where also a running example vignette can be found (https://smac-group.github.io/irg/articles/vignette.html). A few signals from the original data set are made available in the package as an example to apply the method and the complete A.thaliana data can be found at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE97500. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-03-08 /pmc/articles/PMC8388030/ /pubmed/33693548 http://dx.doi.org/10.1093/bioinformatics/btab126 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Heerah, Sachin
Molinari, Roberto
Guerrier, Stéphane
Marshall-Colon, Amy
Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis
title Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis
title_full Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis
title_fullStr Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis
title_full_unstemmed Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis
title_short Granger-causal testing for irregularly sampled time series with application to nitrogen signalling in Arabidopsis
title_sort granger-causal testing for irregularly sampled time series with application to nitrogen signalling in arabidopsis
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8388030/
https://www.ncbi.nlm.nih.gov/pubmed/33693548
http://dx.doi.org/10.1093/bioinformatics/btab126
work_keys_str_mv AT heerahsachin grangercausaltestingforirregularlysampledtimeserieswithapplicationtonitrogensignallinginarabidopsis
AT molinariroberto grangercausaltestingforirregularlysampledtimeserieswithapplicationtonitrogensignallinginarabidopsis
AT guerrierstephane grangercausaltestingforirregularlysampledtimeserieswithapplicationtonitrogensignallinginarabidopsis
AT marshallcolonamy grangercausaltestingforirregularlysampledtimeserieswithapplicationtonitrogensignallinginarabidopsis