Cargando…
Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans
Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
KeAi Publishing
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963321/ https://www.ncbi.nlm.nih.gov/pubmed/29928719 http://dx.doi.org/10.1016/j.idm.2016.07.001 |
_version_ | 1783325028311367680 |
---|---|
author | Carey, Michelle Wu, Shuang Gan, Guojun Wu, Hulin |
author_facet | Carey, Michelle Wu, Shuang Gan, Guojun Wu, Hulin |
author_sort | Carey, Michelle |
collection | PubMed |
description | Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to develop methods, such as mixture and mixed-effects modelling approaches, that incorporate the temporal information contained in the shape of the trajectory of the data. However, there is still a need for the development of time-course clustering methods that can adequately deal with inhomogeneous clusters (some clusters are quite large and others are quite small). Here we propose two such methods, hierarchical clustering (IHC) and iterative pairwise-correlation clustering (IPC). We evaluate and compare the proposed methods to the Markov Cluster Algorithm (MCL) and the generalised mixed-effects model (GMM) using simulation studies and an application to a time course gene expression data set from a study containing human subjects who were challenged by a live influenza virus. We identify four types of temporal gene response modules to influenza infection in humans, i.e., single-gene modules (SGM), small-size modules (SSM), medium-size modules (MSM) and large-size modules (LSM). The LSM contain genes that perform various fundamental biological functions that are consistent across subjects. The SSM and SGM contain genes that perform either different or similar biological functions that have complex temporal responses to the virus and are unique to each subject. We show that the temporal response of the genes in the LSM have either simple patterns with a single peak or trough a consequence of the transient stimuli sustained or state-transitioning patterns pertaining to developmental cues and that these modules can differentiate the severity of disease outcomes. Additionally, the size of gene response modules follows a power-law distribution with a consistent exponent across all subjects, which reveals the presence of universality in the underlying biological principles that generated these modules. |
format | Online Article Text |
id | pubmed-5963321 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | KeAi Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-59633212018-06-20 Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans Carey, Michelle Wu, Shuang Gan, Guojun Wu, Hulin Infect Dis Model Article Many pragmatic clustering methods have been developed to group data vectors or objects into clusters so that the objects in one cluster are very similar and objects in different clusters are distinct based on some similarity measure. The availability of time course data has motivated researchers to develop methods, such as mixture and mixed-effects modelling approaches, that incorporate the temporal information contained in the shape of the trajectory of the data. However, there is still a need for the development of time-course clustering methods that can adequately deal with inhomogeneous clusters (some clusters are quite large and others are quite small). Here we propose two such methods, hierarchical clustering (IHC) and iterative pairwise-correlation clustering (IPC). We evaluate and compare the proposed methods to the Markov Cluster Algorithm (MCL) and the generalised mixed-effects model (GMM) using simulation studies and an application to a time course gene expression data set from a study containing human subjects who were challenged by a live influenza virus. We identify four types of temporal gene response modules to influenza infection in humans, i.e., single-gene modules (SGM), small-size modules (SSM), medium-size modules (MSM) and large-size modules (LSM). The LSM contain genes that perform various fundamental biological functions that are consistent across subjects. The SSM and SGM contain genes that perform either different or similar biological functions that have complex temporal responses to the virus and are unique to each subject. We show that the temporal response of the genes in the LSM have either simple patterns with a single peak or trough a consequence of the transient stimuli sustained or state-transitioning patterns pertaining to developmental cues and that these modules can differentiate the severity of disease outcomes. Additionally, the size of gene response modules follows a power-law distribution with a consistent exponent across all subjects, which reveals the presence of universality in the underlying biological principles that generated these modules. KeAi Publishing 2016-09-02 /pmc/articles/PMC5963321/ /pubmed/29928719 http://dx.doi.org/10.1016/j.idm.2016.07.001 Text en © 2016 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Carey, Michelle Wu, Shuang Gan, Guojun Wu, Hulin Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans |
title | Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans |
title_full | Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans |
title_fullStr | Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans |
title_full_unstemmed | Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans |
title_short | Correlation-based iterative clustering methods for time course data: The identification of temporal gene response modules for influenza infection in humans |
title_sort | correlation-based iterative clustering methods for time course data: the identification of temporal gene response modules for influenza infection in humans |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963321/ https://www.ncbi.nlm.nih.gov/pubmed/29928719 http://dx.doi.org/10.1016/j.idm.2016.07.001 |
work_keys_str_mv | AT careymichelle correlationbasediterativeclusteringmethodsfortimecoursedatatheidentificationoftemporalgeneresponsemodulesforinfluenzainfectioninhumans AT wushuang correlationbasediterativeclusteringmethodsfortimecoursedatatheidentificationoftemporalgeneresponsemodulesforinfluenzainfectioninhumans AT ganguojun correlationbasediterativeclusteringmethodsfortimecoursedatatheidentificationoftemporalgeneresponsemodulesforinfluenzainfectioninhumans AT wuhulin correlationbasediterativeclusteringmethodsfortimecoursedatatheidentificationoftemporalgeneresponsemodulesforinfluenzainfectioninhumans |