Cargando…

Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data

Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clu...

Descripción completa

Detalles Bibliográficos
Autores principales: Nascimento, Moysés, Silva, Fabyano Fonseca e, Sáfadi, Thelma, Nascimento, Ana Carolina Campana, Ferreira, Talles Eduardo Maciel, Barroso, Laís Mayara Azevedo, Ferreira Azevedo, Camila, Guimarães, Simone Eliza Faccione, Serão, Nick Vergara Lopes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5513449/
https://www.ncbi.nlm.nih.gov/pubmed/28715507
http://dx.doi.org/10.1371/journal.pone.0181195
_version_ 1783250666341269504
author Nascimento, Moysés
Silva, Fabyano Fonseca e
Sáfadi, Thelma
Nascimento, Ana Carolina Campana
Ferreira, Talles Eduardo Maciel
Barroso, Laís Mayara Azevedo
Ferreira Azevedo, Camila
Guimarães, Simone Eliza Faccione
Serão, Nick Vergara Lopes
author_facet Nascimento, Moysés
Silva, Fabyano Fonseca e
Sáfadi, Thelma
Nascimento, Ana Carolina Campana
Ferreira, Talles Eduardo Maciel
Barroso, Laís Mayara Azevedo
Ferreira Azevedo, Camila
Guimarães, Simone Eliza Faccione
Serão, Nick Vergara Lopes
author_sort Nascimento, Moysés
collection PubMed
description Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together.
format Online
Article
Text
id pubmed-5513449
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55134492017-08-07 Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data Nascimento, Moysés Silva, Fabyano Fonseca e Sáfadi, Thelma Nascimento, Ana Carolina Campana Ferreira, Talles Eduardo Maciel Barroso, Laís Mayara Azevedo Ferreira Azevedo, Camila Guimarães, Simone Eliza Faccione Serão, Nick Vergara Lopes PLoS One Research Article Gene expression time series (GETS) analysis aims to characterize sets of genes according to their longitudinal patterns of expression. Due to the large number of genes evaluated in GETS analysis, an useful strategy to summarize biological functional processes and regulatory mechanisms is through clustering of genes that present similar expression pattern over time. Traditional cluster methods usually ignore the challenges in GETS, such as the lack of data normality and small number of temporal observations. Independent Component Analysis (ICA) is a statistical procedure that uses a transformation to convert raw time series data into sets of values of independent variables, which can be used for cluster analysis to identify sets of genes with similar temporal expression patterns. ICA allows clustering small series of distribution-free data while accounting for the dependence between subsequent time-points. Using temporal simulated and real (four libraries of two pig breeds at 21, 40, 70 and 90 days of gestation) RNA-seq data set we present a methodology (ICAclust) that jointly considers independent components analysis (ICA) and a hierarchical method for clustering GETS. We compare ICAclust results with those obtained for K-means clustering. ICAclust presented, on average, an absolute gain of 5.15% over the best K-means scenario. Considering the worst scenario for K-means, the gain was of 84.85%, when compared with the best ICAclust result. For the real data set, genes were grouped into six distinct clusters with 89, 51, 153, 67, 40, and 58 genes each, respectively. In general, it can be observed that the 6 clusters presented very distinct expression patterns. Overall, the proposed two-step clustering method (ICAclust) performed well compared to K-means, a traditional method used for cluster analysis of temporal gene expression data. In ICAclust, genes with similar expression pattern over time were clustered together. Public Library of Science 2017-07-17 /pmc/articles/PMC5513449/ /pubmed/28715507 http://dx.doi.org/10.1371/journal.pone.0181195 Text en © 2017 Nascimento et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Nascimento, Moysés
Silva, Fabyano Fonseca e
Sáfadi, Thelma
Nascimento, Ana Carolina Campana
Ferreira, Talles Eduardo Maciel
Barroso, Laís Mayara Azevedo
Ferreira Azevedo, Camila
Guimarães, Simone Eliza Faccione
Serão, Nick Vergara Lopes
Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_full Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_fullStr Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_full_unstemmed Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_short Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data
title_sort independent component analysis (ica) based-clustering of temporal rna-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5513449/
https://www.ncbi.nlm.nih.gov/pubmed/28715507
http://dx.doi.org/10.1371/journal.pone.0181195
work_keys_str_mv AT nascimentomoyses independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata
AT silvafabyanofonsecae independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata
AT safadithelma independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata
AT nascimentoanacarolinacampana independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata
AT ferreiratalleseduardomaciel independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata
AT barrosolaismayaraazevedo independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata
AT ferreiraazevedocamila independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata
AT guimaraessimoneelizafaccione independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata
AT seraonickvergaralopes independentcomponentanalysisicabasedclusteringoftemporalrnaseqdata