Cargando…

Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets

TnSeq is a widely used methodology for determining gene essentiality, conditional fitness, and genetic interactions in bacteria. The Himar1 transposon is restricted to insertions at TA dinucleotides, but otherwise, few site-specific biases have been identified. As a result, most analytical approache...

Descripción completa

Detalles Bibliográficos
Autores principales: Choudhery, Sanjeevani, Brown, A. Jacob, Akusobi, Chidiebere, Rubin, Eric J., Sassetti, Christopher M., Ioerger, Thomas R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8525568/
https://www.ncbi.nlm.nih.gov/pubmed/34665010
http://dx.doi.org/10.1128/mSystems.00876-21
_version_ 1784585706654400512
author Choudhery, Sanjeevani
Brown, A. Jacob
Akusobi, Chidiebere
Rubin, Eric J.
Sassetti, Christopher M.
Ioerger, Thomas R.
author_facet Choudhery, Sanjeevani
Brown, A. Jacob
Akusobi, Chidiebere
Rubin, Eric J.
Sassetti, Christopher M.
Ioerger, Thomas R.
author_sort Choudhery, Sanjeevani
collection PubMed
description TnSeq is a widely used methodology for determining gene essentiality, conditional fitness, and genetic interactions in bacteria. The Himar1 transposon is restricted to insertions at TA dinucleotides, but otherwise, few site-specific biases have been identified. As a result, most analytical approaches assume that insertions are expected to be randomly distributed among TA sites in nonessential regions. However, through analysis of Himar1 transposon libraries in Mycobacterium tuberculosis, we demonstrate that there are site-specific biases that affect the frequency of insertion of the Himar1 transposon at different TA sites. We use machine learning and statistical models to characterize patterns in the nucleotides surrounding TA sites that correlate with high or low insertion counts. We then develop a quantitative model based on these patterns that can be used to predict the expected counts at each TA site based on nucleotide context, which can explain up to half of the variance in insertion counts. We show that these insertion preferences exist in Himar1 TnSeq data sets from other mycobacterial and nonmycobacterial species. We present an improved method for identification of essential genes, called TTN-Fitness, that can better distinguish true biological fitness effects by comparing observed counts to expected counts based on our site-specific model of insertion preferences. Compared to previous essentiality methods, TTN-Fitness can make finer distinctions among genes whose disruption causes a fitness defect (or advantage), separating them out from the large pool of nonessentials, and is able to classify many smaller genes (with few TA sites) that were previously characterized as uncertain. IMPORTANCE When using the Himar1 transposon to create transposon insertion mutant libraries, it is known that the transposon is restricted to insertions at TA dinucleotide sites throughout the genome, and the absence of insertions is used to infer which genes are essential (or conditionally essential) in a bacterial organism. It is widely assumed that insertions in nonessential regions are otherwise random, and this assumption is used as the basis of several methods for statistical analysis of TnSeq data. In this paper, we show that the nucleotide sequence surrounding TA sites influences the magnitude of insertions, and these Himar1 insertion preferences (sequence biases) can partially explain why some sites have higher counts than others. We use this predictive model to make improved estimates of the fitness effects of genes, which help make finer distinctions of the phenotype and biological consequences of disruption of nonessential genes.
format Online
Article
Text
id pubmed-8525568
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-85255682021-10-27 Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets Choudhery, Sanjeevani Brown, A. Jacob Akusobi, Chidiebere Rubin, Eric J. Sassetti, Christopher M. Ioerger, Thomas R. mSystems Research Article TnSeq is a widely used methodology for determining gene essentiality, conditional fitness, and genetic interactions in bacteria. The Himar1 transposon is restricted to insertions at TA dinucleotides, but otherwise, few site-specific biases have been identified. As a result, most analytical approaches assume that insertions are expected to be randomly distributed among TA sites in nonessential regions. However, through analysis of Himar1 transposon libraries in Mycobacterium tuberculosis, we demonstrate that there are site-specific biases that affect the frequency of insertion of the Himar1 transposon at different TA sites. We use machine learning and statistical models to characterize patterns in the nucleotides surrounding TA sites that correlate with high or low insertion counts. We then develop a quantitative model based on these patterns that can be used to predict the expected counts at each TA site based on nucleotide context, which can explain up to half of the variance in insertion counts. We show that these insertion preferences exist in Himar1 TnSeq data sets from other mycobacterial and nonmycobacterial species. We present an improved method for identification of essential genes, called TTN-Fitness, that can better distinguish true biological fitness effects by comparing observed counts to expected counts based on our site-specific model of insertion preferences. Compared to previous essentiality methods, TTN-Fitness can make finer distinctions among genes whose disruption causes a fitness defect (or advantage), separating them out from the large pool of nonessentials, and is able to classify many smaller genes (with few TA sites) that were previously characterized as uncertain. IMPORTANCE When using the Himar1 transposon to create transposon insertion mutant libraries, it is known that the transposon is restricted to insertions at TA dinucleotide sites throughout the genome, and the absence of insertions is used to infer which genes are essential (or conditionally essential) in a bacterial organism. It is widely assumed that insertions in nonessential regions are otherwise random, and this assumption is used as the basis of several methods for statistical analysis of TnSeq data. In this paper, we show that the nucleotide sequence surrounding TA sites influences the magnitude of insertions, and these Himar1 insertion preferences (sequence biases) can partially explain why some sites have higher counts than others. We use this predictive model to make improved estimates of the fitness effects of genes, which help make finer distinctions of the phenotype and biological consequences of disruption of nonessential genes. American Society for Microbiology 2021-10-19 /pmc/articles/PMC8525568/ /pubmed/34665010 http://dx.doi.org/10.1128/mSystems.00876-21 Text en Copyright © 2021 Choudhery et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Choudhery, Sanjeevani
Brown, A. Jacob
Akusobi, Chidiebere
Rubin, Eric J.
Sassetti, Christopher M.
Ioerger, Thomas R.
Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_full Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_fullStr Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_full_unstemmed Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_short Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_sort modeling site-specific nucleotide biases affecting himar1 transposon insertion frequencies in tnseq data sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8525568/
https://www.ncbi.nlm.nih.gov/pubmed/34665010
http://dx.doi.org/10.1128/mSystems.00876-21
work_keys_str_mv AT choudherysanjeevani modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets
AT brownajacob modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets
AT akusobichidiebere modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets
AT rubinericj modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets
AT sassettichristopherm modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets
AT ioergerthomasr modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets