Cargando…

A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data

BACKGROUND: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-cons...

Descripción completa

Detalles Bibliográficos
Autores principales:	Costa, Pedro R, Acencio, Marcio L, Lemke, Ney
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045802/ https://www.ncbi.nlm.nih.gov/pubmed/21210975 http://dx.doi.org/10.1186/1471-2164-11-S5-S9

_version_	1782198872978751488
author	Costa, Pedro R Acencio, Marcio L Lemke, Ney
author_facet	Costa, Pedro R Acencio, Marcio L Lemke, Ney
author_sort	Costa, Pedro R
collection	PubMed
description	BACKGROUND: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products. RESULTS: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively. CONCLUSIONS: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.
format	Text
id	pubmed-3045802
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30458022011-03-01 A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data Costa, Pedro R Acencio, Marcio L Lemke, Ney BMC Genomics Proceedings BACKGROUND: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products. RESULTS: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively. CONCLUSIONS: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability. BioMed Central 2010-12-22 /pmc/articles/PMC3045802/ /pubmed/21210975 http://dx.doi.org/10.1186/1471-2164-11-S5-S9 Text en Copyright ©2010 Costa et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Costa, Pedro R Acencio, Marcio L Lemke, Ney A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data
title	A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data
title_full	A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data
title_fullStr	A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data
title_full_unstemmed	A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data
title_short	A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data
title_sort	machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3045802/ https://www.ncbi.nlm.nih.gov/pubmed/21210975 http://dx.doi.org/10.1186/1471-2164-11-S5-S9
work_keys_str_mv	AT costapedror amachinelearningapproachforgenomewidepredictionofmorbidanddruggablehumangenesbasedonsystemsleveldata AT acenciomarciol amachinelearningapproachforgenomewidepredictionofmorbidanddruggablehumangenesbasedonsystemsleveldata AT lemkeney amachinelearningapproachforgenomewidepredictionofmorbidanddruggablehumangenesbasedonsystemsleveldata AT costapedror machinelearningapproachforgenomewidepredictionofmorbidanddruggablehumangenesbasedonsystemsleveldata AT acenciomarciol machinelearningapproachforgenomewidepredictionofmorbidanddruggablehumangenesbasedonsystemsleveldata AT lemkeney machinelearningapproachforgenomewidepredictionofmorbidanddruggablehumangenesbasedonsystemsleveldata

A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data

Ejemplares similares