Cargando…

Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease

BACKGROUND: Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been...

Descripción completa

Detalles Bibliográficos
Autores principales:	Oldham, Michael C, Langfelder, Peter, Horvath, Steve
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441531/ https://www.ncbi.nlm.nih.gov/pubmed/22691535 http://dx.doi.org/10.1186/1752-0509-6-63

_version_	1782243311692546048
author	Oldham, Michael C Langfelder, Peter Horvath, Steve
author_facet	Oldham, Michael C Langfelder, Peter Horvath, Steve
author_sort	Oldham, Michael C
collection	PubMed
description	BACKGROUND: Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been less common. Consequently, the full extent of sample variation in genomic studies is often under-appreciated, complicating downstream analytical tasks such as gene co-expression network analysis. RESULTS: Here we demonstrate the use of network methods for characterizing sample relationships in microarray data generated from human brain tissue. We describe an approach for identifying outlying samples that does not depend on the choice or use of clustering algorithms. We introduce a battery of measures for quantifying the consistency and integrity of sample relationships, which can be compared across disparate studies, technology platforms, and biological systems. Among these measures, we provide evidence that the correlation between the connectivity and the clustering coefficient (two important network concepts) is a sensitive indicator of homogeneity among biological samples. We also show that this measure, which we refer to as cor(K,C), can distinguish biologically meaningful relationships among subgroups of samples. Specifically, we find that cor(K,C) reveals the profound effect of Huntington’s disease on samples from the caudate nucleus relative to other brain regions. Furthermore, we find that this effect is concentrated in specific modules of genes that are naturally co-expressed in human caudate nucleus, highlighting a new strategy for exploring the effects of disease on sets of genes. CONCLUSIONS: These results underscore the importance of systematically exploring sample relationships in large genomic datasets before seeking to analyze genomic feature activity. We introduce a standardized platform for this purpose using freely available R software that has been designed to enable iterative and interactive exploration of sample networks.
format	Online Article Text
id	pubmed-3441531
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34415312012-09-18 Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease Oldham, Michael C Langfelder, Peter Horvath, Steve BMC Syst Biol Methodology Article BACKGROUND: Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been less common. Consequently, the full extent of sample variation in genomic studies is often under-appreciated, complicating downstream analytical tasks such as gene co-expression network analysis. RESULTS: Here we demonstrate the use of network methods for characterizing sample relationships in microarray data generated from human brain tissue. We describe an approach for identifying outlying samples that does not depend on the choice or use of clustering algorithms. We introduce a battery of measures for quantifying the consistency and integrity of sample relationships, which can be compared across disparate studies, technology platforms, and biological systems. Among these measures, we provide evidence that the correlation between the connectivity and the clustering coefficient (two important network concepts) is a sensitive indicator of homogeneity among biological samples. We also show that this measure, which we refer to as cor(K,C), can distinguish biologically meaningful relationships among subgroups of samples. Specifically, we find that cor(K,C) reveals the profound effect of Huntington’s disease on samples from the caudate nucleus relative to other brain regions. Furthermore, we find that this effect is concentrated in specific modules of genes that are naturally co-expressed in human caudate nucleus, highlighting a new strategy for exploring the effects of disease on sets of genes. CONCLUSIONS: These results underscore the importance of systematically exploring sample relationships in large genomic datasets before seeking to analyze genomic feature activity. We introduce a standardized platform for this purpose using freely available R software that has been designed to enable iterative and interactive exploration of sample networks. BioMed Central 2012-06-12 /pmc/articles/PMC3441531/ /pubmed/22691535 http://dx.doi.org/10.1186/1752-0509-6-63 Text en Copyright ©2012 Oldham et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Oldham, Michael C Langfelder, Peter Horvath, Steve Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease
title	Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease
title_full	Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease
title_fullStr	Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease
title_full_unstemmed	Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease
title_short	Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease
title_sort	network methods for describing sample relationships in genomic datasets: application to huntington’s disease
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441531/ https://www.ncbi.nlm.nih.gov/pubmed/22691535 http://dx.doi.org/10.1186/1752-0509-6-63
work_keys_str_mv	AT oldhammichaelc networkmethodsfordescribingsamplerelationshipsingenomicdatasetsapplicationtohuntingtonsdisease AT langfelderpeter networkmethodsfordescribingsamplerelationshipsingenomicdatasetsapplicationtohuntingtonsdisease AT horvathsteve networkmethodsfordescribingsamplerelationshipsingenomicdatasetsapplicationtohuntingtonsdisease

Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease

Ejemplares similares