Cargando…

Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information

BACKGROUND: The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering t...

Descripción completa

Detalles Bibliográficos
Autores principales: Acencio, Marcio L, Lemke, Ney
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2753850/
https://www.ncbi.nlm.nih.gov/pubmed/19758426
http://dx.doi.org/10.1186/1471-2105-10-290
_version_ 1782172368303554560
author Acencio, Marcio L
Lemke, Ney
author_facet Acencio, Marcio L
Lemke, Ney
author_sort Acencio, Marcio L
collection PubMed
description BACKGROUND: The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. RESULTS: We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. CONCLUSION: We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality.
format Text
id pubmed-2753850
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27538502009-09-29 Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information Acencio, Marcio L Lemke, Ney BMC Bioinformatics Methodology Article BACKGROUND: The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. RESULTS: We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. CONCLUSION: We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality. BioMed Central 2009-09-16 /pmc/articles/PMC2753850/ /pubmed/19758426 http://dx.doi.org/10.1186/1471-2105-10-290 Text en Copyright ©2009 Acencio and Lemke; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Acencio, Marcio L
Lemke, Ney
Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
title Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
title_full Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
title_fullStr Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
title_full_unstemmed Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
title_short Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
title_sort towards the prediction of essential genes by integration of network topology, cellular localization and biological process information
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2753850/
https://www.ncbi.nlm.nih.gov/pubmed/19758426
http://dx.doi.org/10.1186/1471-2105-10-290
work_keys_str_mv AT acenciomarciol towardsthepredictionofessentialgenesbyintegrationofnetworktopologycellularlocalizationandbiologicalprocessinformation
AT lemkeney towardsthepredictionofessentialgenesbyintegrationofnetworktopologycellularlocalizationandbiologicalprocessinformation