Cargando…

Inferring Correlation Networks from Genomic Survey Data

High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies b...

Descripción completa

Detalles Bibliográficos
Autores principales: Friedman, Jonathan, Alm, Eric J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3447976/
https://www.ncbi.nlm.nih.gov/pubmed/23028285
http://dx.doi.org/10.1371/journal.pcbi.1002687
_version_ 1782244208666476544
author Friedman, Jonathan
Alm, Eric J.
author_facet Friedman, Jonathan
Alm, Eric J.
author_sort Friedman, Jonathan
collection PubMed
description High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.
format Online
Article
Text
id pubmed-3447976
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34479762012-10-01 Inferring Correlation Networks from Genomic Survey Data Friedman, Jonathan Alm, Eric J. PLoS Comput Biol Research Article High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity. Public Library of Science 2012-09-20 /pmc/articles/PMC3447976/ /pubmed/23028285 http://dx.doi.org/10.1371/journal.pcbi.1002687 Text en © 2012 Friedman and Alm http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Friedman, Jonathan
Alm, Eric J.
Inferring Correlation Networks from Genomic Survey Data
title Inferring Correlation Networks from Genomic Survey Data
title_full Inferring Correlation Networks from Genomic Survey Data
title_fullStr Inferring Correlation Networks from Genomic Survey Data
title_full_unstemmed Inferring Correlation Networks from Genomic Survey Data
title_short Inferring Correlation Networks from Genomic Survey Data
title_sort inferring correlation networks from genomic survey data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3447976/
https://www.ncbi.nlm.nih.gov/pubmed/23028285
http://dx.doi.org/10.1371/journal.pcbi.1002687
work_keys_str_mv AT friedmanjonathan inferringcorrelationnetworksfromgenomicsurveydata
AT almericj inferringcorrelationnetworksfromgenomicsurveydata