Cargando…

The accuracy of absolute differential abundance analysis from relative count data

Concerns have been raised about the use of relative abundance data derived from next generation sequencing as a proxy for absolute abundances. For example, in the differential abundance setting, compositional effects in relative abundance data may give rise to spurious differences (false positives)...

Descripción completa

Detalles Bibliográficos
Autores principales:	Roche, Kimberly E., Mukherjee, Sayan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9302745/ https://www.ncbi.nlm.nih.gov/pubmed/35816553 http://dx.doi.org/10.1371/journal.pcbi.1010284

_version_	1784751699348422656
author	Roche, Kimberly E. Mukherjee, Sayan
author_facet	Roche, Kimberly E. Mukherjee, Sayan
author_sort	Roche, Kimberly E.
collection	PubMed
description	Concerns have been raised about the use of relative abundance data derived from next generation sequencing as a proxy for absolute abundances. For example, in the differential abundance setting, compositional effects in relative abundance data may give rise to spurious differences (false positives) when considered from the absolute perspective. In practice however, relative abundances are often transformed by renormalization strategies intended to compensate for these effects and the scope of the practical problem remains unclear. We used simulated data to explore the consistency of differential abundance calling on renormalized relative abundances versus absolute abundances and find that, while overall consistency is high, with a median sensitivity (true positive rates) of 0.91 and specificity (1—false positive rates) of 0.89, consistency can be much lower where there is widespread change in the abundance of features across conditions. We confirm these findings on a large number of real data sets drawn from 16S metabarcoding, expression array, bulk RNA-seq, and single-cell RNA-seq experiments, where data sets with the greatest change between experimental conditions are also those with the highest false positive rates. Finally, we evaluate the predictive utility of summary features of relative abundance data themselves. Estimates of sparsity and the prevalence of feature-level change in relative abundance data give reasonable predictions of discrepancy in differential abundance calling in simulated data and can provide useful bounds for worst-case outcomes in real data.
format	Online Article Text
id	pubmed-9302745
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-93027452022-07-22 The accuracy of absolute differential abundance analysis from relative count data Roche, Kimberly E. Mukherjee, Sayan PLoS Comput Biol Research Article Concerns have been raised about the use of relative abundance data derived from next generation sequencing as a proxy for absolute abundances. For example, in the differential abundance setting, compositional effects in relative abundance data may give rise to spurious differences (false positives) when considered from the absolute perspective. In practice however, relative abundances are often transformed by renormalization strategies intended to compensate for these effects and the scope of the practical problem remains unclear. We used simulated data to explore the consistency of differential abundance calling on renormalized relative abundances versus absolute abundances and find that, while overall consistency is high, with a median sensitivity (true positive rates) of 0.91 and specificity (1—false positive rates) of 0.89, consistency can be much lower where there is widespread change in the abundance of features across conditions. We confirm these findings on a large number of real data sets drawn from 16S metabarcoding, expression array, bulk RNA-seq, and single-cell RNA-seq experiments, where data sets with the greatest change between experimental conditions are also those with the highest false positive rates. Finally, we evaluate the predictive utility of summary features of relative abundance data themselves. Estimates of sparsity and the prevalence of feature-level change in relative abundance data give reasonable predictions of discrepancy in differential abundance calling in simulated data and can provide useful bounds for worst-case outcomes in real data. Public Library of Science 2022-07-11 /pmc/articles/PMC9302745/ /pubmed/35816553 http://dx.doi.org/10.1371/journal.pcbi.1010284 Text en © 2022 Roche, Mukherjee https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Roche, Kimberly E. Mukherjee, Sayan The accuracy of absolute differential abundance analysis from relative count data
title	The accuracy of absolute differential abundance analysis from relative count data
title_full	The accuracy of absolute differential abundance analysis from relative count data
title_fullStr	The accuracy of absolute differential abundance analysis from relative count data
title_full_unstemmed	The accuracy of absolute differential abundance analysis from relative count data
title_short	The accuracy of absolute differential abundance analysis from relative count data
title_sort	accuracy of absolute differential abundance analysis from relative count data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9302745/ https://www.ncbi.nlm.nih.gov/pubmed/35816553 http://dx.doi.org/10.1371/journal.pcbi.1010284
work_keys_str_mv	AT rochekimberlye theaccuracyofabsolutedifferentialabundanceanalysisfromrelativecountdata AT mukherjeesayan theaccuracyofabsolutedifferentialabundanceanalysisfromrelativecountdata AT rochekimberlye accuracyofabsolutedifferentialabundanceanalysisfromrelativecountdata AT mukherjeesayan accuracyofabsolutedifferentialabundanceanalysisfromrelativecountdata

The accuracy of absolute differential abundance analysis from relative count data

Ejemplares similares