Cargando…

Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans

Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to...

Descripción completa

Detalles Bibliográficos
Autores principales: Carey, Michelle, Wu, Shuang, Gan, Guojun, Wu, Hulin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: KeAi Publishing 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963321/
https://www.ncbi.nlm.nih.gov/pubmed/29928719
http://dx.doi.org/10.1016/j.idm.2016.07.001
_version_ 1783325028311367680
author Carey, Michelle
Wu, Shuang
Gan, Guojun
Wu, Hulin
author_facet Carey, Michelle
Wu, Shuang
Gan, Guojun
Wu, Hulin
author_sort Carey, Michelle
collection PubMed
description Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to develop methods, such as mixture and mixed-effects modelling approaches, that incorporate the temporal information contained in the shape of the trajectory of the data. However, there is still a need for the development of time-course clustering methods that can adequately deal with inhomogeneous clusters (some clusters are quite large and others are quite small). Here we propose two such methods, hierarchical clustering (IHC) and iterative pairwise-correlation clustering (IPC). We evaluate and compare the proposed methods to the Markov Cluster Algorithm (MCL) and the generalised mixed-effects model (GMM) using simulation studies and an application to a time course gene expression data set from a study containing human subjects who were challenged by a live influenza virus. We identify four types of temporal gene response modules to influenza infection in humans, i.e., single-gene modules (SGM), small-size modules (SSM), medium-size modules (MSM) and large-size modules (LSM). The LSM contain genes that perform various fundamental biological functions that are consistent across subjects. The SSM and SGM contain genes that perform either different or similar biological functions that have complex temporal responses to the virus and are unique to each subject. We show that the temporal response of the genes in the LSM have either simple patterns with a single peak or trough a consequence of the transient stimuli sustained or state-transitioning patterns pertaining to developmental cues and that these modules can differentiate the severity of disease outcomes. Additionally, the size of gene response modules follows a power-law distribution with a consistent exponent across all subjects, which reveals the presence of universality in the underlying biological principles that generated these modules.
format Online
Article
Text
id pubmed-5963321
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher KeAi Publishing
record_format MEDLINE/PubMed
spelling pubmed-59633212018-06-20 Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans Carey, Michelle Wu, Shuang Gan, Guojun Wu, Hulin Infect Dis Model Article Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to develop methods, such as mixture and mixed-effects modelling approaches, that incorporate the temporal information contained in the shape of the trajectory of the data. However, there is still a need for the development of time-course clustering methods that can adequately deal with inhomogeneous clusters (some clusters are quite large and others are quite small). Here we propose two such methods, hierarchical clustering (IHC) and iterative pairwise-correlation clustering (IPC). We evaluate and compare the proposed methods to the Markov Cluster Algorithm (MCL) and the generalised mixed-effects model (GMM) using simulation studies and an application to a time course gene expression data set from a study containing human subjects who were challenged by a live influenza virus. We identify four types of temporal gene response modules to influenza infection in humans, i.e., single-gene modules (SGM), small-size modules (SSM), medium-size modules (MSM) and large-size modules (LSM). The LSM contain genes that perform various fundamental biological functions that are consistent across subjects. The SSM and SGM contain genes that perform either different or similar biological functions that have complex temporal responses to the virus and are unique to each subject. We show that the temporal response of the genes in the LSM have either simple patterns with a single peak or trough a consequence of the transient stimuli sustained or state-transitioning patterns pertaining to developmental cues and that these modules can differentiate the severity of disease outcomes. Additionally, the size of gene response modules follows a power-law distribution with a consistent exponent across all subjects, which reveals the presence of universality in the underlying biological principles that generated these modules. KeAi Publishing 2016-09-02 /pmc/articles/PMC5963321/ /pubmed/29928719 http://dx.doi.org/10.1016/j.idm.2016.07.001 Text en © 2016 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Carey, Michelle
Wu, Shuang
Gan, Guojun
Wu, Hulin
Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans
title Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans
title_full Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans
title_fullStr Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans
title_full_unstemmed Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans
title_short Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans
title_sort correlation-based iterative clustering methods for time course data: the identification of temporal gene response modules for influenza infection in humans
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963321/
https://www.ncbi.nlm.nih.gov/pubmed/29928719
http://dx.doi.org/10.1016/j.idm.2016.07.001
work_keys_str_mv AT careymichelle correlationbasediterativeclusteringmethodsfortimecoursedatatheidentificationoftemporalgeneresponsemodulesforinfluenzainfectioninhumans
AT wushuang correlationbasediterativeclusteringmethodsfortimecoursedatatheidentificationoftemporalgeneresponsemodulesforinfluenzainfectioninhumans
AT ganguojun correlationbasediterativeclusteringmethodsfortimecoursedatatheidentificationoftemporalgeneresponsemodulesforinfluenzainfectioninhumans
AT wuhulin correlationbasediterativeclusteringmethodsfortimecoursedatatheidentificationoftemporalgeneresponsemodulesforinfluenzainfectioninhumans