Cargando…

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods

BACKGROUND: The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biologic...

Descripción completa

Detalles Bibliográficos
Autores principales: Huttenhower, Curtis, Flamholz, Avi I, Landis, Jessica N, Sahi, Sauhard, Myers, Chad L, Olszewski, Kellen L, Hibbs, Matthew A, Siemers, Nathan O, Troyanskaya, Olga G, Coller, Hilary A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1941745/
https://www.ncbi.nlm.nih.gov/pubmed/17626636
http://dx.doi.org/10.1186/1471-2105-8-250
_version_ 1782134468609310720
author Huttenhower, Curtis
Flamholz, Avi I
Landis, Jessica N
Sahi, Sauhard
Myers, Chad L
Olszewski, Kellen L
Hibbs, Matthew A
Siemers, Nathan O
Troyanskaya, Olga G
Coller, Hilary A
author_facet Huttenhower, Curtis
Flamholz, Avi I
Landis, Jessica N
Sahi, Sauhard
Myers, Chad L
Olszewski, Kellen L
Hibbs, Matthew A
Siemers, Nathan O
Troyanskaya, Olga G
Coller, Hilary A
author_sort Huttenhower, Curtis
collection PubMed
description BACKGROUND: The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). RESULTS: We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. CONCLUSION: The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.
format Text
id pubmed-1941745
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19417452007-08-09 Nearest Neighbor Networks: clustering expression data based on gene neighborhoods Huttenhower, Curtis Flamholz, Avi I Landis, Jessica N Sahi, Sauhard Myers, Chad L Olszewski, Kellen L Hibbs, Matthew A Siemers, Nathan O Troyanskaya, Olga G Coller, Hilary A BMC Bioinformatics Software BACKGROUND: The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes). RESULTS: We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods. CONCLUSION: The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision. BioMed Central 2007-07-12 /pmc/articles/PMC1941745/ /pubmed/17626636 http://dx.doi.org/10.1186/1471-2105-8-250 Text en Copyright © 2007 Huttenhower et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Huttenhower, Curtis
Flamholz, Avi I
Landis, Jessica N
Sahi, Sauhard
Myers, Chad L
Olszewski, Kellen L
Hibbs, Matthew A
Siemers, Nathan O
Troyanskaya, Olga G
Coller, Hilary A
Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_full Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_fullStr Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_full_unstemmed Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_short Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
title_sort nearest neighbor networks: clustering expression data based on gene neighborhoods
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1941745/
https://www.ncbi.nlm.nih.gov/pubmed/17626636
http://dx.doi.org/10.1186/1471-2105-8-250
work_keys_str_mv AT huttenhowercurtis nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT flamholzavii nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT landisjessican nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT sahisauhard nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT myerschadl nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT olszewskikellenl nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT hibbsmatthewa nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT siemersnathano nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT troyanskayaolgag nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods
AT collerhilarya nearestneighbornetworksclusteringexpressiondatabasedongeneneighborhoods