Cargando…

Improving clustering with metabolic pathway data

BACKGROUND: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological...

Descripción completa

Detalles Bibliográficos
Autores principales: Milone, Diego H, Stegmayer, Georgina, López, Mariana, Kamenetzky, Laura, Carrari, Fernando
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4002909/
https://www.ncbi.nlm.nih.gov/pubmed/24717120
http://dx.doi.org/10.1186/1471-2105-15-101
_version_ 1782313820204564480
author Milone, Diego H
Stegmayer, Georgina
López, Mariana
Kamenetzky, Laura
Carrari, Fernando
author_facet Milone, Diego H
Stegmayer, Georgina
López, Mariana
Kamenetzky, Laura
Carrari, Fernando
author_sort Milone, Diego H
collection PubMed
description BACKGROUND: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. RESULTS: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. CONCLUSIONS: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis. The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.
format Online
Article
Text
id pubmed-4002909
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40029092014-05-09 Improving clustering with metabolic pathway data Milone, Diego H Stegmayer, Georgina López, Mariana Kamenetzky, Laura Carrari, Fernando BMC Bioinformatics Research Article BACKGROUND: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. RESULTS: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. CONCLUSIONS: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis. The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom. BioMed Central 2014-04-10 /pmc/articles/PMC4002909/ /pubmed/24717120 http://dx.doi.org/10.1186/1471-2105-15-101 Text en Copyright © 2014 Milone et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Milone, Diego H
Stegmayer, Georgina
López, Mariana
Kamenetzky, Laura
Carrari, Fernando
Improving clustering with metabolic pathway data
title Improving clustering with metabolic pathway data
title_full Improving clustering with metabolic pathway data
title_fullStr Improving clustering with metabolic pathway data
title_full_unstemmed Improving clustering with metabolic pathway data
title_short Improving clustering with metabolic pathway data
title_sort improving clustering with metabolic pathway data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4002909/
https://www.ncbi.nlm.nih.gov/pubmed/24717120
http://dx.doi.org/10.1186/1471-2105-15-101
work_keys_str_mv AT milonediegoh improvingclusteringwithmetabolicpathwaydata
AT stegmayergeorgina improvingclusteringwithmetabolicpathwaydata
AT lopezmariana improvingclusteringwithmetabolicpathwaydata
AT kamenetzkylaura improvingclusteringwithmetabolicpathwaydata
AT carrarifernando improvingclusteringwithmetabolicpathwaydata