Cargando…

Analysis and correction of compositional bias in sparse sequencing count data

BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assay...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kumar, M. Senthil, Slud, Eric V., Okrah, Kwame, Hicks, Stephanie C., Hannenhalli, Sridhar, Corrada Bravo, Héctor
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219007/ https://www.ncbi.nlm.nih.gov/pubmed/30400812 http://dx.doi.org/10.1186/s12864-018-5160-5

_version_	1783368563750338560
author	Kumar, M. Senthil Slud, Eric V. Okrah, Kwame Hicks, Stephanie C. Hannenhalli, Sridhar Corrada Bravo, Héctor
author_facet	Kumar, M. Senthil Slud, Eric V. Okrah, Kwame Hicks, Stephanie C. Hannenhalli, Sridhar Corrada Bravo, Héctor
author_sort	Kumar, M. Senthil
collection	PubMed
description	BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. RESULTS: We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. CONCLUSIONS: Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5160-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6219007
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-62190072018-11-08 Analysis and correction of compositional bias in sparse sequencing count data Kumar, M. Senthil Slud, Eric V. Okrah, Kwame Hicks, Stephanie C. Hannenhalli, Sridhar Corrada Bravo, Héctor BMC Genomics Methodology Article BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. RESULTS: We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. CONCLUSIONS: Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-5160-5) contains supplementary material, which is available to authorized users. BioMed Central 2018-11-06 /pmc/articles/PMC6219007/ /pubmed/30400812 http://dx.doi.org/10.1186/s12864-018-5160-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Kumar, M. Senthil Slud, Eric V. Okrah, Kwame Hicks, Stephanie C. Hannenhalli, Sridhar Corrada Bravo, Héctor Analysis and correction of compositional bias in sparse sequencing count data
title	Analysis and correction of compositional bias in sparse sequencing count data
title_full	Analysis and correction of compositional bias in sparse sequencing count data
title_fullStr	Analysis and correction of compositional bias in sparse sequencing count data
title_full_unstemmed	Analysis and correction of compositional bias in sparse sequencing count data
title_short	Analysis and correction of compositional bias in sparse sequencing count data
title_sort	analysis and correction of compositional bias in sparse sequencing count data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219007/ https://www.ncbi.nlm.nih.gov/pubmed/30400812 http://dx.doi.org/10.1186/s12864-018-5160-5
work_keys_str_mv	AT kumarmsenthil analysisandcorrectionofcompositionalbiasinsparsesequencingcountdata AT sludericv analysisandcorrectionofcompositionalbiasinsparsesequencingcountdata AT okrahkwame analysisandcorrectionofcompositionalbiasinsparsesequencingcountdata AT hicksstephaniec analysisandcorrectionofcompositionalbiasinsparsesequencingcountdata AT hannenhallisridhar analysisandcorrectionofcompositionalbiasinsparsesequencingcountdata AT corradabravohector analysisandcorrectionofcompositionalbiasinsparsesequencingcountdata

Analysis and correction of compositional bias in sparse sequencing count data

Ejemplares similares