Cargando…

A new computational strategy for predicting essential genes

BACKGROUND: Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Jian, Wu, Wenwu, Zhang, Yinwen, Li, Xiangchen, Jiang, Xiaoqian, Wei, Gehong, Tao, Shiheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880044/
https://www.ncbi.nlm.nih.gov/pubmed/24359534
http://dx.doi.org/10.1186/1471-2164-14-910
_version_ 1782298035004375040
author Cheng, Jian
Wu, Wenwu
Zhang, Yinwen
Li, Xiangchen
Jiang, Xiaoqian
Wei, Gehong
Tao, Shiheng
author_facet Cheng, Jian
Wu, Wenwu
Zhang, Yinwen
Li, Xiangchen
Jiang, Xiaoqian
Wei, Gehong
Tao, Shiheng
author_sort Cheng, Jian
collection PubMed
description BACKGROUND: Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. RESULTS: We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. CONCLUSIONS: FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
format Online
Article
Text
id pubmed-3880044
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38800442014-01-09 A new computational strategy for predicting essential genes Cheng, Jian Wu, Wenwu Zhang, Yinwen Li, Xiangchen Jiang, Xiaoqian Wei, Gehong Tao, Shiheng BMC Genomics Methodology Article BACKGROUND: Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. RESULTS: We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. CONCLUSIONS: FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets. BioMed Central 2013-12-21 /pmc/articles/PMC3880044/ /pubmed/24359534 http://dx.doi.org/10.1186/1471-2164-14-910 Text en Copyright © 2013 Cheng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Cheng, Jian
Wu, Wenwu
Zhang, Yinwen
Li, Xiangchen
Jiang, Xiaoqian
Wei, Gehong
Tao, Shiheng
A new computational strategy for predicting essential genes
title A new computational strategy for predicting essential genes
title_full A new computational strategy for predicting essential genes
title_fullStr A new computational strategy for predicting essential genes
title_full_unstemmed A new computational strategy for predicting essential genes
title_short A new computational strategy for predicting essential genes
title_sort new computational strategy for predicting essential genes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880044/
https://www.ncbi.nlm.nih.gov/pubmed/24359534
http://dx.doi.org/10.1186/1471-2164-14-910
work_keys_str_mv AT chengjian anewcomputationalstrategyforpredictingessentialgenes
AT wuwenwu anewcomputationalstrategyforpredictingessentialgenes
AT zhangyinwen anewcomputationalstrategyforpredictingessentialgenes
AT lixiangchen anewcomputationalstrategyforpredictingessentialgenes
AT jiangxiaoqian anewcomputationalstrategyforpredictingessentialgenes
AT weigehong anewcomputationalstrategyforpredictingessentialgenes
AT taoshiheng anewcomputationalstrategyforpredictingessentialgenes
AT chengjian newcomputationalstrategyforpredictingessentialgenes
AT wuwenwu newcomputationalstrategyforpredictingessentialgenes
AT zhangyinwen newcomputationalstrategyforpredictingessentialgenes
AT lixiangchen newcomputationalstrategyforpredictingessentialgenes
AT jiangxiaoqian newcomputationalstrategyforpredictingessentialgenes
AT weigehong newcomputationalstrategyforpredictingessentialgenes
AT taoshiheng newcomputationalstrategyforpredictingessentialgenes