Cargando…

Candidate gene prioritization by network analysis of differential expression using machine learning approaches

BACKGROUND: Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up fo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nitsch, Daniela, Gonçalves, Joana P, Ojeda, Fabian, de Moor, Bart, Moreau, Yves
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2945940/ https://www.ncbi.nlm.nih.gov/pubmed/20840752 http://dx.doi.org/10.1186/1471-2105-11-460

_version_	1782187252472872960
author	Nitsch, Daniela Gonçalves, Joana P Ojeda, Fabian de Moor, Bart Moreau, Yves
author_facet	Nitsch, Daniela Gonçalves, Joana P Ojeda, Fabian de Moor, Bart Moreau, Yves
author_sort	Nitsch, Daniela
collection	PubMed
description	BACKGROUND: Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. RESULTS: We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. CONCLUSION: In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.
format	Text
id	pubmed-2945940
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29459402010-10-21 Candidate gene prioritization by network analysis of differential expression using machine learning approaches Nitsch, Daniela Gonçalves, Joana P Ojeda, Fabian de Moor, Bart Moreau, Yves BMC Bioinformatics Methodology Article BACKGROUND: Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. RESULTS: We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. CONCLUSION: In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype. BioMed Central 2010-09-14 /pmc/articles/PMC2945940/ /pubmed/20840752 http://dx.doi.org/10.1186/1471-2105-11-460 Text en Copyright ©2010 Nitsch et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Nitsch, Daniela Gonçalves, Joana P Ojeda, Fabian de Moor, Bart Moreau, Yves Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title	Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_full	Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_fullStr	Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_full_unstemmed	Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_short	Candidate gene prioritization by network analysis of differential expression using machine learning approaches
title_sort	candidate gene prioritization by network analysis of differential expression using machine learning approaches
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2945940/ https://www.ncbi.nlm.nih.gov/pubmed/20840752 http://dx.doi.org/10.1186/1471-2105-11-460
work_keys_str_mv	AT nitschdaniela candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches AT goncalvesjoanap candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches AT ojedafabian candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches AT demoorbart candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches AT moreauyves candidategeneprioritizationbynetworkanalysisofdifferentialexpressionusingmachinelearningapproaches

Candidate gene prioritization by network analysis of differential expression using machine learning approaches

Ejemplares similares