Cargando…

Learning from biomedical linked data to suggest valid pharmacogenes

BACKGROUND: A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been propo...

Descripción completa

Detalles Bibliográficos
Autores principales: Dalleau, Kevin, Marzougui, Yassine, Da Silva, Sébastien, Ringot, Patrice, Ndiaye, Ndeye Coumba, Coulet, Adrien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399403/
https://www.ncbi.nlm.nih.gov/pubmed/28427468
http://dx.doi.org/10.1186/s13326-017-0125-1
_version_ 1783230638976925696
author Dalleau, Kevin
Marzougui, Yassine
Da Silva, Sébastien
Ringot, Patrice
Ndiaye, Ndeye Coumba
Coulet, Adrien
author_facet Dalleau, Kevin
Marzougui, Yassine
Da Silva, Sébastien
Ringot, Patrice
Ndiaye, Ndeye Coumba
Coulet, Adrien
author_sort Dalleau, Kevin
collection PubMed
description BACKGROUND: A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, only molecular networks or the biomedical literature were used, whereas many other resources are available. METHOD: We propose here to consume a diverse and larger set of resources using linked data related either to genes, drugs or diseases. One of the advantages of linked data is that they are built on a standard framework that facilitates the joint use of various sources, and thus facilitates considering features of various origins. We propose a selection and linkage of data sources relevant to pharmacogenomics, including for example DisGeNET and Clinvar. We use machine learning to identify and prioritize pharmacogenes that are the most probably valid, considering the selected linked data. This identification relies on the classification of gene–drug pairs as either pharmacogenomically associated or not and was experimented with two machine learning methods –random forest and graph kernel–, which results are compared in this article. RESULTS: We assembled a set of linked data relative to pharmacogenomics, of 2,610,793 triples, coming from six distinct resources. Learning from these data, random forest enables identifying valid pharmacogenes with a F-measure of 0.73, on a 10 folds cross-validation, whereas graph kernel achieves a F-measure of 0.81. A list of top candidates proposed by both approaches is provided and their obtention is discussed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-017-0125-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5399403
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53994032017-04-24 Learning from biomedical linked data to suggest valid pharmacogenes Dalleau, Kevin Marzougui, Yassine Da Silva, Sébastien Ringot, Patrice Ndiaye, Ndeye Coumba Coulet, Adrien J Biomed Semantics Research BACKGROUND: A standard task in pharmacogenomics research is identifying genes that may be involved in drug response variability, i.e., pharmacogenes. Because genomic experiments tended to generate many false positives, computational approaches based on the use of background knowledge have been proposed. Until now, only molecular networks or the biomedical literature were used, whereas many other resources are available. METHOD: We propose here to consume a diverse and larger set of resources using linked data related either to genes, drugs or diseases. One of the advantages of linked data is that they are built on a standard framework that facilitates the joint use of various sources, and thus facilitates considering features of various origins. We propose a selection and linkage of data sources relevant to pharmacogenomics, including for example DisGeNET and Clinvar. We use machine learning to identify and prioritize pharmacogenes that are the most probably valid, considering the selected linked data. This identification relies on the classification of gene–drug pairs as either pharmacogenomically associated or not and was experimented with two machine learning methods –random forest and graph kernel–, which results are compared in this article. RESULTS: We assembled a set of linked data relative to pharmacogenomics, of 2,610,793 triples, coming from six distinct resources. Learning from these data, random forest enables identifying valid pharmacogenes with a F-measure of 0.73, on a 10 folds cross-validation, whereas graph kernel achieves a F-measure of 0.81. A list of top candidates proposed by both approaches is provided and their obtention is discussed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13326-017-0125-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-20 /pmc/articles/PMC5399403/ /pubmed/28427468 http://dx.doi.org/10.1186/s13326-017-0125-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Dalleau, Kevin
Marzougui, Yassine
Da Silva, Sébastien
Ringot, Patrice
Ndiaye, Ndeye Coumba
Coulet, Adrien
Learning from biomedical linked data to suggest valid pharmacogenes
title Learning from biomedical linked data to suggest valid pharmacogenes
title_full Learning from biomedical linked data to suggest valid pharmacogenes
title_fullStr Learning from biomedical linked data to suggest valid pharmacogenes
title_full_unstemmed Learning from biomedical linked data to suggest valid pharmacogenes
title_short Learning from biomedical linked data to suggest valid pharmacogenes
title_sort learning from biomedical linked data to suggest valid pharmacogenes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399403/
https://www.ncbi.nlm.nih.gov/pubmed/28427468
http://dx.doi.org/10.1186/s13326-017-0125-1
work_keys_str_mv AT dalleaukevin learningfrombiomedicallinkeddatatosuggestvalidpharmacogenes
AT marzouguiyassine learningfrombiomedicallinkeddatatosuggestvalidpharmacogenes
AT dasilvasebastien learningfrombiomedicallinkeddatatosuggestvalidpharmacogenes
AT ringotpatrice learningfrombiomedicallinkeddatatosuggestvalidpharmacogenes
AT ndiayendeyecoumba learningfrombiomedicallinkeddatatosuggestvalidpharmacogenes
AT couletadrien learningfrombiomedicallinkeddatatosuggestvalidpharmacogenes