Cargando…

In silico prediction of novel therapeutic targets using gene–disease association data

BACKGROUND: Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerab...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ferrero, Enrico, Dunham, Ian, Sanseau, Philippe
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5576250/ https://www.ncbi.nlm.nih.gov/pubmed/28851378 http://dx.doi.org/10.1186/s12967-017-1285-6

_version_	1783260164502061056
author	Ferrero, Enrico Dunham, Ian Sanseau, Philippe
author_facet	Ferrero, Enrico Dunham, Ian Sanseau, Philippe
author_sort	Ferrero, Enrico
collection	PubMed
description	BACKGROUND: Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene–disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. METHODS: To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. RESULTS: We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. CONCLUSIONS: Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12967-017-1285-6) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5576250
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-55762502017-08-30 In silico prediction of novel therapeutic targets using gene–disease association data Ferrero, Enrico Dunham, Ian Sanseau, Philippe J Transl Med Research BACKGROUND: Target identification and validation is a pressing challenge in the pharmaceutical industry, with many of the programmes that fail for efficacy reasons showing poor association between the drug target and the disease. Computational prediction of successful targets could have a considerable impact on attrition rates in the drug discovery pipeline by significantly reducing the initial search space. Here, we explore whether gene–disease association data from the Open Targets platform is sufficient to predict therapeutic targets that are actively being pursued by pharmaceutical companies or are already on the market. METHODS: To test our hypothesis, we train four different classifiers (a random forest, a support vector machine, a neural network and a gradient boosting machine) on partially labelled data and evaluate their performance using nested cross-validation and testing on an independent set. We then select the best performing model and use it to make predictions on more than 15,000 genes. Finally, we validate our predictions by mining the scientific literature for proposed therapeutic targets. RESULTS: We observe that the data types with the best predictive power are animal models showing a disease-relevant phenotype, differential expression in diseased tissue and genetic association with the disease under investigation. On a test set, the neural network classifier achieves over 71% accuracy with an AUC of 0.76 when predicting therapeutic targets in a semi-supervised learning setting. We use this model to gain insights into current and failed programmes and to predict 1431 novel targets, of which a highly significant proportion has been independently proposed in the literature. CONCLUSIONS: Our in silico approach shows that data linking genes and diseases is sufficient to predict novel therapeutic targets effectively and confirms that this type of evidence is essential for formulating or strengthening hypotheses in the target discovery process. Ultimately, more rapid and automated target prioritisation holds the potential to reduce both the costs and the development times associated with bringing new medicines to patients. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12967-017-1285-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-29 /pmc/articles/PMC5576250/ /pubmed/28851378 http://dx.doi.org/10.1186/s12967-017-1285-6 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Ferrero, Enrico Dunham, Ian Sanseau, Philippe In silico prediction of novel therapeutic targets using gene–disease association data
title	In silico prediction of novel therapeutic targets using gene–disease association data
title_full	In silico prediction of novel therapeutic targets using gene–disease association data
title_fullStr	In silico prediction of novel therapeutic targets using gene–disease association data
title_full_unstemmed	In silico prediction of novel therapeutic targets using gene–disease association data
title_short	In silico prediction of novel therapeutic targets using gene–disease association data
title_sort	in silico prediction of novel therapeutic targets using gene–disease association data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5576250/ https://www.ncbi.nlm.nih.gov/pubmed/28851378 http://dx.doi.org/10.1186/s12967-017-1285-6
work_keys_str_mv	AT ferreroenrico insilicopredictionofnoveltherapeutictargetsusinggenediseaseassociationdata AT dunhamian insilicopredictionofnoveltherapeutictargetsusinggenediseaseassociationdata AT sanseauphilippe insilicopredictionofnoveltherapeutictargetsusinggenediseaseassociationdata

In silico prediction of novel therapeutic targets using gene–disease association data

Ejemplares similares