Cargando…
How to normalize metatranscriptomic count data for differential expression analysis
BACKGROUND: Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it has not been clear whether a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5649605/ https://www.ncbi.nlm.nih.gov/pubmed/29062598 http://dx.doi.org/10.7717/peerj.3859 |
_version_ | 1783272570252951552 |
---|---|
author | Klingenberg, Heiner Meinicke, Peter |
author_facet | Klingenberg, Heiner Meinicke, Peter |
author_sort | Klingenberg, Heiner |
collection | PubMed |
description | BACKGROUND: Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it has not been clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics. METHODS: We propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples. As a main consequence the correct normalization of metatranscriptomic count data under this model requires the taxonomic separation of the data into organism-specific bins. Then the taxon-specific scaling of organism profiles yields a valid normalization and allows us to recombine the scaled profiles into a metatranscriptomic count matrix. This matrix can then be analyzed with statistical tools for transcriptomic count data. For taxon-specific scaling and recombination of scaled counts we provide a simple R script. RESULTS: When applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data with an organism-independent (global) scaling of counts the resulting differences may be difficult to interpret. The differences may correspond to changing functional profiles of the contributing organisms but may also result from a variation of taxonomic abundances. Taxon-specific scaling eliminates this variation and therefore the resulting differences actually reflect a different behavior of organisms under changing conditions. In simulation studies we show that the divergence between results from global and taxon-specific scaling can be drastic. In particular, the variation of organism abundances can imply a considerable increase of significant differences with global scaling. Also, on real metatranscriptomic data, the predictions from taxon-specific and global scaling can differ widely. Our studies indicate that in real data applications performed with global scaling it might be impossible to distinguish between differential expression in terms of transcriptomic changes and differential composition in terms of changing taxonomic proportions. CONCLUSIONS: As in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics. Our model implies a taxon-specific scaling of counts for normalization of the data. The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore provides a clear interpretation of the observed functional differences. |
format | Online Article Text |
id | pubmed-5649605 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-56496052017-10-23 How to normalize metatranscriptomic count data for differential expression analysis Klingenberg, Heiner Meinicke, Peter PeerJ Bioinformatics BACKGROUND: Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it has not been clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics. METHODS: We propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples. As a main consequence the correct normalization of metatranscriptomic count data under this model requires the taxonomic separation of the data into organism-specific bins. Then the taxon-specific scaling of organism profiles yields a valid normalization and allows us to recombine the scaled profiles into a metatranscriptomic count matrix. This matrix can then be analyzed with statistical tools for transcriptomic count data. For taxon-specific scaling and recombination of scaled counts we provide a simple R script. RESULTS: When applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data with an organism-independent (global) scaling of counts the resulting differences may be difficult to interpret. The differences may correspond to changing functional profiles of the contributing organisms but may also result from a variation of taxonomic abundances. Taxon-specific scaling eliminates this variation and therefore the resulting differences actually reflect a different behavior of organisms under changing conditions. In simulation studies we show that the divergence between results from global and taxon-specific scaling can be drastic. In particular, the variation of organism abundances can imply a considerable increase of significant differences with global scaling. Also, on real metatranscriptomic data, the predictions from taxon-specific and global scaling can differ widely. Our studies indicate that in real data applications performed with global scaling it might be impossible to distinguish between differential expression in terms of transcriptomic changes and differential composition in terms of changing taxonomic proportions. CONCLUSIONS: As in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics. Our model implies a taxon-specific scaling of counts for normalization of the data. The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore provides a clear interpretation of the observed functional differences. PeerJ Inc. 2017-10-17 /pmc/articles/PMC5649605/ /pubmed/29062598 http://dx.doi.org/10.7717/peerj.3859 Text en ©2017 Klingenberg and Meinicke http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Klingenberg, Heiner Meinicke, Peter How to normalize metatranscriptomic count data for differential expression analysis |
title | How to normalize metatranscriptomic count data for differential expression analysis |
title_full | How to normalize metatranscriptomic count data for differential expression analysis |
title_fullStr | How to normalize metatranscriptomic count data for differential expression analysis |
title_full_unstemmed | How to normalize metatranscriptomic count data for differential expression analysis |
title_short | How to normalize metatranscriptomic count data for differential expression analysis |
title_sort | how to normalize metatranscriptomic count data for differential expression analysis |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5649605/ https://www.ncbi.nlm.nih.gov/pubmed/29062598 http://dx.doi.org/10.7717/peerj.3859 |
work_keys_str_mv | AT klingenbergheiner howtonormalizemetatranscriptomiccountdatafordifferentialexpressionanalysis AT meinickepeter howtonormalizemetatranscriptomiccountdatafordifferentialexpressionanalysis |