Cargando…

Predicting gene knockout effects from expression data

BACKGROUND: The study of gene essentiality, which measures the importance of a gene for cell division and survival, is used for the identification of cancer drug targets and understanding of tissue-specific manifestation of genetic conditions. In this work, we analyze essentiality and gene expressio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rosenski, Jonathan, Shifman, Sagiv, Kaplan, Tommy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9938619/ https://www.ncbi.nlm.nih.gov/pubmed/36803845 http://dx.doi.org/10.1186/s12920-023-01446-6

_version_	1784890670550351872
author	Rosenski, Jonathan Shifman, Sagiv Kaplan, Tommy
author_facet	Rosenski, Jonathan Shifman, Sagiv Kaplan, Tommy
author_sort	Rosenski, Jonathan
collection	PubMed
description	BACKGROUND: The study of gene essentiality, which measures the importance of a gene for cell division and survival, is used for the identification of cancer drug targets and understanding of tissue-specific manifestation of genetic conditions. In this work, we analyze essentiality and gene expression data from over 900 cancer lines from the DepMap project to create predictive models of gene essentiality. METHODS: We developed machine learning algorithms to identify those genes whose essentiality levels are explained by the expression of a small set of “modifier genes”. To identify these gene sets, we developed an ensemble of statistical tests capturing linear and non-linear dependencies. We trained several regression models predicting the essentiality of each target gene, and used an automated model selection procedure to identify the optimal model and hyperparameters. Overall, we examined linear models, gradient boosted trees, Gaussian process regression models, and deep learning networks. RESULTS: We identified nearly 3000 genes for which we accurately predict essentiality using gene expression data of a small set of modifier genes. We show that both in the number of genes we successfully make predictions for, as well as in the prediction accuracy, our model outperforms current state-of-the-art works. CONCLUSIONS: Our modeling framework avoids overfitting by identifying the small set of modifier genes, which are of clinical and genetic importance, and ignores the expression of noisy and irrelevant genes. Doing so improves the accuracy of essentiality prediction in various conditions and provides interpretable models. Overall, we present an accurate computational approach, as well as interpretable modeling of essentiality in a wide range of cellular conditions, thus contributing to a better understanding of the molecular mechanisms that govern tissue-specific effects of genetic disease and cancer. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-023-01446-6.
format	Online Article Text
id	pubmed-9938619
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-99386192023-02-19 Predicting gene knockout effects from expression data Rosenski, Jonathan Shifman, Sagiv Kaplan, Tommy BMC Med Genomics Research BACKGROUND: The study of gene essentiality, which measures the importance of a gene for cell division and survival, is used for the identification of cancer drug targets and understanding of tissue-specific manifestation of genetic conditions. In this work, we analyze essentiality and gene expression data from over 900 cancer lines from the DepMap project to create predictive models of gene essentiality. METHODS: We developed machine learning algorithms to identify those genes whose essentiality levels are explained by the expression of a small set of “modifier genes”. To identify these gene sets, we developed an ensemble of statistical tests capturing linear and non-linear dependencies. We trained several regression models predicting the essentiality of each target gene, and used an automated model selection procedure to identify the optimal model and hyperparameters. Overall, we examined linear models, gradient boosted trees, Gaussian process regression models, and deep learning networks. RESULTS: We identified nearly 3000 genes for which we accurately predict essentiality using gene expression data of a small set of modifier genes. We show that both in the number of genes we successfully make predictions for, as well as in the prediction accuracy, our model outperforms current state-of-the-art works. CONCLUSIONS: Our modeling framework avoids overfitting by identifying the small set of modifier genes, which are of clinical and genetic importance, and ignores the expression of noisy and irrelevant genes. Doing so improves the accuracy of essentiality prediction in various conditions and provides interpretable models. Overall, we present an accurate computational approach, as well as interpretable modeling of essentiality in a wide range of cellular conditions, thus contributing to a better understanding of the molecular mechanisms that govern tissue-specific effects of genetic disease and cancer. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-023-01446-6. BioMed Central 2023-02-18 /pmc/articles/PMC9938619/ /pubmed/36803845 http://dx.doi.org/10.1186/s12920-023-01446-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Rosenski, Jonathan Shifman, Sagiv Kaplan, Tommy Predicting gene knockout effects from expression data
title	Predicting gene knockout effects from expression data
title_full	Predicting gene knockout effects from expression data
title_fullStr	Predicting gene knockout effects from expression data
title_full_unstemmed	Predicting gene knockout effects from expression data
title_short	Predicting gene knockout effects from expression data
title_sort	predicting gene knockout effects from expression data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9938619/ https://www.ncbi.nlm.nih.gov/pubmed/36803845 http://dx.doi.org/10.1186/s12920-023-01446-6
work_keys_str_mv	AT rosenskijonathan predictinggeneknockouteffectsfromexpressiondata AT shifmansagiv predictinggeneknockouteffectsfromexpressiondata AT kaplantommy predictinggeneknockouteffectsfromexpressiondata

Predicting gene knockout effects from expression data

Ejemplares similares