Cargando…

AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

BACKGROUND: Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally...

Descripción completa

Detalles Bibliográficos
Autores principales: Newman, Aaron M, Cooper, James B
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2846907/
https://www.ncbi.nlm.nih.gov/pubmed/20202218
http://dx.doi.org/10.1186/1471-2105-11-117
_version_ 1782179516551004160
author Newman, Aaron M
Cooper, James B
author_facet Newman, Aaron M
Cooper, James B
author_sort Newman, Aaron M
collection PubMed
description BACKGROUND: Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. RESULTS: We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. CONCLUSIONS: By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.
format Text
id pubmed-2846907
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28469072010-03-30 AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number Newman, Aaron M Cooper, James B BMC Bioinformatics Methodology article BACKGROUND: Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. RESULTS: We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. CONCLUSIONS: By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome. BioMed Central 2010-03-04 /pmc/articles/PMC2846907/ /pubmed/20202218 http://dx.doi.org/10.1186/1471-2105-11-117 Text en Copyright ©2010 Newman and Cooper; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Newman, Aaron M
Cooper, James B
AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
title AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
title_full AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
title_fullStr AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
title_full_unstemmed AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
title_short AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number
title_sort autosome: a clustering method for identifying gene expression modules without prior knowledge of cluster number
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2846907/
https://www.ncbi.nlm.nih.gov/pubmed/20202218
http://dx.doi.org/10.1186/1471-2105-11-117
work_keys_str_mv AT newmanaaronm autosomeaclusteringmethodforidentifyinggeneexpressionmoduleswithoutpriorknowledgeofclusternumber
AT cooperjamesb autosomeaclusteringmethodforidentifyinggeneexpressionmoduleswithoutpriorknowledgeofclusternumber