Cargando…
Genome-wide analysis of fitness data and its application to improve metabolic models
BACKGROUND: Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence. RESULTS: We develop statistical and computational pipelines and frameworks for a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6180484/ https://www.ncbi.nlm.nih.gov/pubmed/30305012 http://dx.doi.org/10.1186/s12859-018-2341-9 |
_version_ | 1783362209583202304 |
---|---|
author | Vitkin, Edward Solomon, Oz Sultan, Sharon Yakhini, Zohar |
author_facet | Vitkin, Edward Solomon, Oz Sultan, Sharon Yakhini, Zohar |
author_sort | Vitkin, Edward |
collection | PubMed |
description | BACKGROUND: Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence. RESULTS: We develop statistical and computational pipelines and frameworks for analyzing high throughput fitness data over a genome scale set of sequence variants. Analyzing data from a high-throughput knock-down/knock-out bacterial study, we investigate differences and determinants of the effect on fitness in different conditions. Comparing fitness vectors of genes, across tens of conditions, we observe that fitness consequences strongly depend on genomic location and more weakly depend on gene sequence similarity and on functional relationships. In analyzing promoter sequences, we identified motifs associated with conditions studied in bacterial media such as Casaminos, D-glucose, Sucrose, and other sugars and amino-acid sources. We also use fitness data to infer genes associated with orphan metabolic reactions in the iJO1366 E. coli metabolic model. To do this, we developed a new computational method that integrates gene fitness and gene expression profiles within a given reaction network neighborhood to associate this reaction with a set of genes that potentially encode the catalyzing proteins. We then apply this approach to predict candidate genes for 107 orphan reactions in iJO1366. Furthermore - we validate our methodology with known reactions using a leave-one-out approach. Specifically, using top-20 candidates selected based on combined fitness and expression datasets, we correctly reconstruct 39.7% of the reactions, as compared to 33% based on fitness and to 26% based on expression separately, and to 4.02% as a random baseline. Our model improvement results include a novel association of a gene to an orphan cytosine nucleosidation reaction. CONCLUSION: Our pipeline for metabolic modeling shows a clear benefit of using fitness data for predicting genes of orphan reactions. Along with the analysis pipelines we developed, it can be used to analyze similar high-throughput data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2341-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6180484 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61804842018-10-18 Genome-wide analysis of fitness data and its application to improve metabolic models Vitkin, Edward Solomon, Oz Sultan, Sharon Yakhini, Zohar BMC Bioinformatics Research Article BACKGROUND: Synthetic biology and related techniques enable genome scale high-throughput investigation of the effect on organism fitness of different gene knock-downs/outs and of other modifications of genomic sequence. RESULTS: We develop statistical and computational pipelines and frameworks for analyzing high throughput fitness data over a genome scale set of sequence variants. Analyzing data from a high-throughput knock-down/knock-out bacterial study, we investigate differences and determinants of the effect on fitness in different conditions. Comparing fitness vectors of genes, across tens of conditions, we observe that fitness consequences strongly depend on genomic location and more weakly depend on gene sequence similarity and on functional relationships. In analyzing promoter sequences, we identified motifs associated with conditions studied in bacterial media such as Casaminos, D-glucose, Sucrose, and other sugars and amino-acid sources. We also use fitness data to infer genes associated with orphan metabolic reactions in the iJO1366 E. coli metabolic model. To do this, we developed a new computational method that integrates gene fitness and gene expression profiles within a given reaction network neighborhood to associate this reaction with a set of genes that potentially encode the catalyzing proteins. We then apply this approach to predict candidate genes for 107 orphan reactions in iJO1366. Furthermore - we validate our methodology with known reactions using a leave-one-out approach. Specifically, using top-20 candidates selected based on combined fitness and expression datasets, we correctly reconstruct 39.7% of the reactions, as compared to 33% based on fitness and to 26% based on expression separately, and to 4.02% as a random baseline. Our model improvement results include a novel association of a gene to an orphan cytosine nucleosidation reaction. CONCLUSION: Our pipeline for metabolic modeling shows a clear benefit of using fitness data for predicting genes of orphan reactions. Along with the analysis pipelines we developed, it can be used to analyze similar high-throughput data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2341-9) contains supplementary material, which is available to authorized users. BioMed Central 2018-10-10 /pmc/articles/PMC6180484/ /pubmed/30305012 http://dx.doi.org/10.1186/s12859-018-2341-9 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Vitkin, Edward Solomon, Oz Sultan, Sharon Yakhini, Zohar Genome-wide analysis of fitness data and its application to improve metabolic models |
title | Genome-wide analysis of fitness data and its application to improve metabolic models |
title_full | Genome-wide analysis of fitness data and its application to improve metabolic models |
title_fullStr | Genome-wide analysis of fitness data and its application to improve metabolic models |
title_full_unstemmed | Genome-wide analysis of fitness data and its application to improve metabolic models |
title_short | Genome-wide analysis of fitness data and its application to improve metabolic models |
title_sort | genome-wide analysis of fitness data and its application to improve metabolic models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6180484/ https://www.ncbi.nlm.nih.gov/pubmed/30305012 http://dx.doi.org/10.1186/s12859-018-2341-9 |
work_keys_str_mv | AT vitkinedward genomewideanalysisoffitnessdataanditsapplicationtoimprovemetabolicmodels AT solomonoz genomewideanalysisoffitnessdataanditsapplicationtoimprovemetabolicmodels AT sultansharon genomewideanalysisoffitnessdataanditsapplicationtoimprovemetabolicmodels AT yakhinizohar genomewideanalysisoffitnessdataanditsapplicationtoimprovemetabolicmodels |