Cargando…

Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression

BACKGROUND: Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions va...

Descripción completa

Detalles Bibliográficos
Autores principales: Subramaniyam, Siddharth, DeJesus, Michael A., Zaveri, Anisha, Smith, Clare M., Baker, Richard E., Ehrt, Sabine, Schnappinger, Dirk, Sassetti, Christopher M., Ioerger, Thomas R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6873424/
https://www.ncbi.nlm.nih.gov/pubmed/31752678
http://dx.doi.org/10.1186/s12859-019-3156-z
_version_ 1783472649495642112
author Subramaniyam, Siddharth
DeJesus, Michael A.
Zaveri, Anisha
Smith, Clare M.
Baker, Richard E.
Ehrt, Sabine
Schnappinger, Dirk
Sassetti, Christopher M.
Ioerger, Thomas R.
author_facet Subramaniyam, Siddharth
DeJesus, Michael A.
Zaveri, Anisha
Smith, Clare M.
Baker, Richard E.
Ehrt, Sabine
Schnappinger, Dirk
Sassetti, Christopher M.
Ioerger, Thomas R.
author_sort Subramaniyam, Siddharth
collection PubMed
description BACKGROUND: Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. RESULTS: In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. CONCLUSIONS: Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model.
format Online
Article
Text
id pubmed-6873424
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68734242019-12-12 Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression Subramaniyam, Siddharth DeJesus, Michael A. Zaveri, Anisha Smith, Clare M. Baker, Richard E. Ehrt, Sabine Schnappinger, Dirk Sassetti, Christopher M. Ioerger, Thomas R. BMC Bioinformatics Methodology Article BACKGROUND: Deep sequencing of transposon mutant libraries (or TnSeq) is a powerful method for probing essentiality of genomic loci under different environmental conditions. Various analytical methods have been described for identifying conditionally essential genes whose tolerance for insertions varies between two conditions. However, for large-scale experiments involving many conditions, a method is needed for identifying genes that exhibit significant variability in insertions across multiple conditions. RESULTS: In this paper, we introduce a novel statistical method for identifying genes with significant variability of insertion counts across multiple conditions based on Zero-Inflated Negative Binomial (ZINB) regression. Using likelihood ratio tests, we show that the ZINB distribution fits TnSeq data better than either ANOVA or a Negative Binomial (in a generalized linear model). We use ZINB regression to identify genes required for infection of M. tuberculosis H37Rv in C57BL/6 mice. We also use ZINB to perform a analysis of genes conditionally essential in H37Rv cultures exposed to multiple antibiotics. CONCLUSIONS: Our results show that, not only does ZINB generally identify most of the genes found by pairwise resampling (and vastly out-performs ANOVA), but it also identifies additional genes where variability is detectable only when the magnitudes of insertion counts are treated separately from local differences in saturation, as in the ZINB model. BioMed Central 2019-11-21 /pmc/articles/PMC6873424/ /pubmed/31752678 http://dx.doi.org/10.1186/s12859-019-3156-z Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Subramaniyam, Siddharth
DeJesus, Michael A.
Zaveri, Anisha
Smith, Clare M.
Baker, Richard E.
Ehrt, Sabine
Schnappinger, Dirk
Sassetti, Christopher M.
Ioerger, Thomas R.
Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_full Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_fullStr Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_full_unstemmed Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_short Statistical analysis of variability in TnSeq data across conditions using zero-inflated negative binomial regression
title_sort statistical analysis of variability in tnseq data across conditions using zero-inflated negative binomial regression
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6873424/
https://www.ncbi.nlm.nih.gov/pubmed/31752678
http://dx.doi.org/10.1186/s12859-019-3156-z
work_keys_str_mv AT subramaniyamsiddharth statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT dejesusmichaela statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT zaverianisha statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT smithclarem statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT bakerricharde statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT ehrtsabine statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT schnappingerdirk statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT sassettichristopherm statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression
AT ioergerthomasr statisticalanalysisofvariabilityintnseqdataacrossconditionsusingzeroinflatednegativebinomialregression