Cargando…

Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets

TnSeq is a widely used methodology for determining gene essentiality, conditional fitness, and genetic interactions in bacteria. The Himar1 transposon is restricted to insertions at TA dinucleotides, but otherwise, few site-specific biases have been identified. As a result, most analytical approache...

Descripción completa

Detalles Bibliográficos
Autores principales:	Choudhery, Sanjeevani, Brown, A. Jacob, Akusobi, Chidiebere, Rubin, Eric J., Sassetti, Christopher M., Ioerger, Thomas R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Society for Microbiology 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8525568/ https://www.ncbi.nlm.nih.gov/pubmed/34665010 http://dx.doi.org/10.1128/mSystems.00876-21

_version_	1784585706654400512
author	Choudhery, Sanjeevani Brown, A. Jacob Akusobi, Chidiebere Rubin, Eric J. Sassetti, Christopher M. Ioerger, Thomas R.
author_facet	Choudhery, Sanjeevani Brown, A. Jacob Akusobi, Chidiebere Rubin, Eric J. Sassetti, Christopher M. Ioerger, Thomas R.
author_sort	Choudhery, Sanjeevani
collection	PubMed
description	TnSeq is a widely used methodology for determining gene essentiality, conditional fitness, and genetic interactions in bacteria. The Himar1 transposon is restricted to insertions at TA dinucleotides, but otherwise, few site-specific biases have been identified. As a result, most analytical approaches assume that insertions are expected to be randomly distributed among TA sites in nonessential regions. However, through analysis of Himar1 transposon libraries in Mycobacterium tuberculosis, we demonstrate that there are site-specific biases that affect the frequency of insertion of the Himar1 transposon at different TA sites. We use machine learning and statistical models to characterize patterns in the nucleotides surrounding TA sites that correlate with high or low insertion counts. We then develop a quantitative model based on these patterns that can be used to predict the expected counts at each TA site based on nucleotide context, which can explain up to half of the variance in insertion counts. We show that these insertion preferences exist in Himar1 TnSeq data sets from other mycobacterial and nonmycobacterial species. We present an improved method for identification of essential genes, called TTN-Fitness, that can better distinguish true biological fitness effects by comparing observed counts to expected counts based on our site-specific model of insertion preferences. Compared to previous essentiality methods, TTN-Fitness can make finer distinctions among genes whose disruption causes a fitness defect (or advantage), separating them out from the large pool of nonessentials, and is able to classify many smaller genes (with few TA sites) that were previously characterized as uncertain. IMPORTANCE When using the Himar1 transposon to create transposon insertion mutant libraries, it is known that the transposon is restricted to insertions at TA dinucleotide sites throughout the genome, and the absence of insertions is used to infer which genes are essential (or conditionally essential) in a bacterial organism. It is widely assumed that insertions in nonessential regions are otherwise random, and this assumption is used as the basis of several methods for statistical analysis of TnSeq data. In this paper, we show that the nucleotide sequence surrounding TA sites influences the magnitude of insertions, and these Himar1 insertion preferences (sequence biases) can partially explain why some sites have higher counts than others. We use this predictive model to make improved estimates of the fitness effects of genes, which help make finer distinctions of the phenotype and biological consequences of disruption of nonessential genes.
format	Online Article Text
id	pubmed-8525568
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	American Society for Microbiology
record_format	MEDLINE/PubMed
spelling	pubmed-85255682021-10-27 Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets Choudhery, Sanjeevani Brown, A. Jacob Akusobi, Chidiebere Rubin, Eric J. Sassetti, Christopher M. Ioerger, Thomas R. mSystems Research Article TnSeq is a widely used methodology for determining gene essentiality, conditional fitness, and genetic interactions in bacteria. The Himar1 transposon is restricted to insertions at TA dinucleotides, but otherwise, few site-specific biases have been identified. As a result, most analytical approaches assume that insertions are expected to be randomly distributed among TA sites in nonessential regions. However, through analysis of Himar1 transposon libraries in Mycobacterium tuberculosis, we demonstrate that there are site-specific biases that affect the frequency of insertion of the Himar1 transposon at different TA sites. We use machine learning and statistical models to characterize patterns in the nucleotides surrounding TA sites that correlate with high or low insertion counts. We then develop a quantitative model based on these patterns that can be used to predict the expected counts at each TA site based on nucleotide context, which can explain up to half of the variance in insertion counts. We show that these insertion preferences exist in Himar1 TnSeq data sets from other mycobacterial and nonmycobacterial species. We present an improved method for identification of essential genes, called TTN-Fitness, that can better distinguish true biological fitness effects by comparing observed counts to expected counts based on our site-specific model of insertion preferences. Compared to previous essentiality methods, TTN-Fitness can make finer distinctions among genes whose disruption causes a fitness defect (or advantage), separating them out from the large pool of nonessentials, and is able to classify many smaller genes (with few TA sites) that were previously characterized as uncertain. IMPORTANCE When using the Himar1 transposon to create transposon insertion mutant libraries, it is known that the transposon is restricted to insertions at TA dinucleotide sites throughout the genome, and the absence of insertions is used to infer which genes are essential (or conditionally essential) in a bacterial organism. It is widely assumed that insertions in nonessential regions are otherwise random, and this assumption is used as the basis of several methods for statistical analysis of TnSeq data. In this paper, we show that the nucleotide sequence surrounding TA sites influences the magnitude of insertions, and these Himar1 insertion preferences (sequence biases) can partially explain why some sites have higher counts than others. We use this predictive model to make improved estimates of the fitness effects of genes, which help make finer distinctions of the phenotype and biological consequences of disruption of nonessential genes. American Society for Microbiology 2021-10-19 /pmc/articles/PMC8525568/ /pubmed/34665010 http://dx.doi.org/10.1128/mSystems.00876-21 Text en Copyright © 2021 Choudhery et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Research Article Choudhery, Sanjeevani Brown, A. Jacob Akusobi, Chidiebere Rubin, Eric J. Sassetti, Christopher M. Ioerger, Thomas R. Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title	Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_full	Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_fullStr	Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_full_unstemmed	Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_short	Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets
title_sort	modeling site-specific nucleotide biases affecting himar1 transposon insertion frequencies in tnseq data sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8525568/ https://www.ncbi.nlm.nih.gov/pubmed/34665010 http://dx.doi.org/10.1128/mSystems.00876-21
work_keys_str_mv	AT choudherysanjeevani modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets AT brownajacob modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets AT akusobichidiebere modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets AT rubinericj modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets AT sassettichristopherm modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets AT ioergerthomasr modelingsitespecificnucleotidebiasesaffectinghimar1transposoninsertionfrequenciesintnseqdatasets

Modeling Site-Specific Nucleotide Biases Affecting Himar1 Transposon Insertion Frequencies in TnSeq Data Sets

Ejemplares similares