Cargando…

An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms

Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which h...

Descripción completa

Detalles Bibliográficos
Autores principales: Hua, Hong-Li, Zhang, Fa-Zhan, Labena, Abraham Alemayehu, Dong, Chuan, Jin, Yan-Ting, Guo, Feng-Biao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5021884/
https://www.ncbi.nlm.nih.gov/pubmed/27660763
http://dx.doi.org/10.1155/2016/7639397
_version_ 1782453413851693056
author Hua, Hong-Li
Zhang, Fa-Zhan
Labena, Abraham Alemayehu
Dong, Chuan
Jin, Yan-Ting
Guo, Feng-Biao
author_facet Hua, Hong-Li
Zhang, Fa-Zhan
Labena, Abraham Alemayehu
Dong, Chuan
Jin, Yan-Ting
Guo, Feng-Biao
author_sort Hua, Hong-Li
collection PubMed
description Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus, which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.
format Online
Article
Text
id pubmed-5021884
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-50218842016-09-22 An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms Hua, Hong-Li Zhang, Fa-Zhan Labena, Abraham Alemayehu Dong, Chuan Jin, Yan-Ting Guo, Feng-Biao Biomed Res Int Research Article Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus, which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge. Hindawi Publishing Corporation 2016 2016-08-30 /pmc/articles/PMC5021884/ /pubmed/27660763 http://dx.doi.org/10.1155/2016/7639397 Text en Copyright © 2016 Hong-Li Hua et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hua, Hong-Li
Zhang, Fa-Zhan
Labena, Abraham Alemayehu
Dong, Chuan
Jin, Yan-Ting
Guo, Feng-Biao
An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms
title An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms
title_full An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms
title_fullStr An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms
title_full_unstemmed An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms
title_short An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms
title_sort approach for predicting essential genes using multiple homology mapping and machine learning algorithms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5021884/
https://www.ncbi.nlm.nih.gov/pubmed/27660763
http://dx.doi.org/10.1155/2016/7639397
work_keys_str_mv AT huahongli anapproachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT zhangfazhan anapproachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT labenaabrahamalemayehu anapproachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT dongchuan anapproachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT jinyanting anapproachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT guofengbiao anapproachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT huahongli approachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT zhangfazhan approachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT labenaabrahamalemayehu approachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT dongchuan approachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT jinyanting approachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms
AT guofengbiao approachforpredictingessentialgenesusingmultiplehomologymappingandmachinelearningalgorithms