Cargando…
A new computational strategy for predicting essential genes
BACKGROUND: Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880044/ https://www.ncbi.nlm.nih.gov/pubmed/24359534 http://dx.doi.org/10.1186/1471-2164-14-910 |
_version_ | 1782298035004375040 |
---|---|
author | Cheng, Jian Wu, Wenwu Zhang, Yinwen Li, Xiangchen Jiang, Xiaoqian Wei, Gehong Tao, Shiheng |
author_facet | Cheng, Jian Wu, Wenwu Zhang, Yinwen Li, Xiangchen Jiang, Xiaoqian Wei, Gehong Tao, Shiheng |
author_sort | Cheng, Jian |
collection | PubMed |
description | BACKGROUND: Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. RESULTS: We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. CONCLUSIONS: FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets. |
format | Online Article Text |
id | pubmed-3880044 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38800442014-01-09 A new computational strategy for predicting essential genes Cheng, Jian Wu, Wenwu Zhang, Yinwen Li, Xiangchen Jiang, Xiaoqian Wei, Gehong Tao, Shiheng BMC Genomics Methodology Article BACKGROUND: Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. RESULTS: We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. CONCLUSIONS: FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets. BioMed Central 2013-12-21 /pmc/articles/PMC3880044/ /pubmed/24359534 http://dx.doi.org/10.1186/1471-2164-14-910 Text en Copyright © 2013 Cheng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Cheng, Jian Wu, Wenwu Zhang, Yinwen Li, Xiangchen Jiang, Xiaoqian Wei, Gehong Tao, Shiheng A new computational strategy for predicting essential genes |
title | A new computational strategy for predicting essential genes |
title_full | A new computational strategy for predicting essential genes |
title_fullStr | A new computational strategy for predicting essential genes |
title_full_unstemmed | A new computational strategy for predicting essential genes |
title_short | A new computational strategy for predicting essential genes |
title_sort | new computational strategy for predicting essential genes |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880044/ https://www.ncbi.nlm.nih.gov/pubmed/24359534 http://dx.doi.org/10.1186/1471-2164-14-910 |
work_keys_str_mv | AT chengjian anewcomputationalstrategyforpredictingessentialgenes AT wuwenwu anewcomputationalstrategyforpredictingessentialgenes AT zhangyinwen anewcomputationalstrategyforpredictingessentialgenes AT lixiangchen anewcomputationalstrategyforpredictingessentialgenes AT jiangxiaoqian anewcomputationalstrategyforpredictingessentialgenes AT weigehong anewcomputationalstrategyforpredictingessentialgenes AT taoshiheng anewcomputationalstrategyforpredictingessentialgenes AT chengjian newcomputationalstrategyforpredictingessentialgenes AT wuwenwu newcomputationalstrategyforpredictingessentialgenes AT zhangyinwen newcomputationalstrategyforpredictingessentialgenes AT lixiangchen newcomputationalstrategyforpredictingessentialgenes AT jiangxiaoqian newcomputationalstrategyforpredictingessentialgenes AT weigehong newcomputationalstrategyforpredictingessentialgenes AT taoshiheng newcomputationalstrategyforpredictingessentialgenes |