Cargando…

Benchmarking network propagation methods for disease gene identification

In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 var...

Descripción completa

Detalles Bibliográficos
Autores principales:	Picart-Armada, Sergio, Barrett, Steven J., Willé, David R., Perera-Lluna, Alexandre, Gutteridge, Alex, Dessailly, Benoit H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6743778/ https://www.ncbi.nlm.nih.gov/pubmed/31479437 http://dx.doi.org/10.1371/journal.pcbi.1007276

_version_	1783451326783422464
author	Picart-Armada, Sergio Barrett, Steven J. Willé, David R. Perera-Lluna, Alexandre Gutteridge, Alex Dessailly, Benoit H.
author_facet	Picart-Armada, Sergio Barrett, Steven J. Willé, David R. Perera-Lluna, Alexandre Gutteridge, Alex Dessailly, Benoit H.
author_sort	Picart-Armada, Sergio
collection	PubMed
description	In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes.
format	Online Article Text
id	pubmed-6743778
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-67437782019-09-20 Benchmarking network propagation methods for disease gene identification Picart-Armada, Sergio Barrett, Steven J. Willé, David R. Perera-Lluna, Alexandre Gutteridge, Alex Dessailly, Benoit H. PLoS Comput Biol Research Article In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes. Public Library of Science 2019-09-03 /pmc/articles/PMC6743778/ /pubmed/31479437 http://dx.doi.org/10.1371/journal.pcbi.1007276 Text en © 2019 Picart-Armada et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Picart-Armada, Sergio Barrett, Steven J. Willé, David R. Perera-Lluna, Alexandre Gutteridge, Alex Dessailly, Benoit H. Benchmarking network propagation methods for disease gene identification
title	Benchmarking network propagation methods for disease gene identification
title_full	Benchmarking network propagation methods for disease gene identification
title_fullStr	Benchmarking network propagation methods for disease gene identification
title_full_unstemmed	Benchmarking network propagation methods for disease gene identification
title_short	Benchmarking network propagation methods for disease gene identification
title_sort	benchmarking network propagation methods for disease gene identification
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6743778/ https://www.ncbi.nlm.nih.gov/pubmed/31479437 http://dx.doi.org/10.1371/journal.pcbi.1007276
work_keys_str_mv	AT picartarmadasergio benchmarkingnetworkpropagationmethodsfordiseasegeneidentification AT barrettstevenj benchmarkingnetworkpropagationmethodsfordiseasegeneidentification AT willedavidr benchmarkingnetworkpropagationmethodsfordiseasegeneidentification AT pererallunaalexandre benchmarkingnetworkpropagationmethodsfordiseasegeneidentification AT gutteridgealex benchmarkingnetworkpropagationmethodsfordiseasegeneidentification AT dessaillybenoith benchmarkingnetworkpropagationmethodsfordiseasegeneidentification

Benchmarking network propagation methods for disease gene identification

Ejemplares similares