Cargando…

PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks

PubMed is the largest resource of curated biomedical knowledge to date, entailing more than 25 million documents. Large quantities of novel literature prevent a single expert from keeping track of all potentially relevant papers, resulting in knowledge gaps. In this article, we present CHEMMESHNET,...

Descripción completa

Detalles Bibliográficos
Autores principales: Škrlj, Blaž, Kokalj, Enja, Lavrač, Nada
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8076635/
https://www.ncbi.nlm.nih.gov/pubmed/33928210
http://dx.doi.org/10.3389/frma.2021.644614
_version_ 1783684722201722880
author Škrlj, Blaž
Kokalj, Enja
Lavrač, Nada
author_facet Škrlj, Blaž
Kokalj, Enja
Lavrač, Nada
author_sort Škrlj, Blaž
collection PubMed
description PubMed is the largest resource of curated biomedical knowledge to date, entailing more than 25 million documents. Large quantities of novel literature prevent a single expert from keeping track of all potentially relevant papers, resulting in knowledge gaps. In this article, we present CHEMMESHNET, a newly developed PubMed-based network comprising more than 10,000,000 associations, constructed from expert-curated MeSH annotations of chemicals based on all currently available PubMed articles. By learning latent representations of concepts in the obtained network, we demonstrate in a proof of concept study that purely literature-based representations are sufficient for the reconstruction of a large part of the currently known network of physical, empirically determined protein–protein interactions. We demonstrate that simple linear embeddings of node pairs, when coupled with a neural network–based classifier, reliably reconstruct the existing collection of empirically confirmed protein–protein interactions. Furthermore, we demonstrate how pairs of learned representations can be used to prioritize potentially interesting novel interactions based on the common chemical context. Highly ranked interactions are qualitatively inspected in terms of potential complex formation at the structural level and represent potentially interesting new knowledge. We demonstrate that two protein–protein interactions, prioritized by structure-based approaches, also emerge as probable with regard to the trained machine-learning model.
format Online
Article
Text
id pubmed-8076635
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-80766352021-04-28 PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks Škrlj, Blaž Kokalj, Enja Lavrač, Nada Front Res Metr Anal Research Metrics and Analytics PubMed is the largest resource of curated biomedical knowledge to date, entailing more than 25 million documents. Large quantities of novel literature prevent a single expert from keeping track of all potentially relevant papers, resulting in knowledge gaps. In this article, we present CHEMMESHNET, a newly developed PubMed-based network comprising more than 10,000,000 associations, constructed from expert-curated MeSH annotations of chemicals based on all currently available PubMed articles. By learning latent representations of concepts in the obtained network, we demonstrate in a proof of concept study that purely literature-based representations are sufficient for the reconstruction of a large part of the currently known network of physical, empirically determined protein–protein interactions. We demonstrate that simple linear embeddings of node pairs, when coupled with a neural network–based classifier, reliably reconstruct the existing collection of empirically confirmed protein–protein interactions. Furthermore, we demonstrate how pairs of learned representations can be used to prioritize potentially interesting novel interactions based on the common chemical context. Highly ranked interactions are qualitatively inspected in terms of potential complex formation at the structural level and represent potentially interesting new knowledge. We demonstrate that two protein–protein interactions, prioritized by structure-based approaches, also emerge as probable with regard to the trained machine-learning model. Frontiers Media S.A. 2021-04-13 /pmc/articles/PMC8076635/ /pubmed/33928210 http://dx.doi.org/10.3389/frma.2021.644614 Text en Copyright © 2021 Škrlj, Kokalj and Lavrač. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Research Metrics and Analytics
Škrlj, Blaž
Kokalj, Enja
Lavrač, Nada
PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks
title PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks
title_full PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks
title_fullStr PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks
title_full_unstemmed PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks
title_short PubMed-Scale Chemical Concept Embeddings Reconstruct Physical Protein Interaction Networks
title_sort pubmed-scale chemical concept embeddings reconstruct physical protein interaction networks
topic Research Metrics and Analytics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8076635/
https://www.ncbi.nlm.nih.gov/pubmed/33928210
http://dx.doi.org/10.3389/frma.2021.644614
work_keys_str_mv AT skrljblaz pubmedscalechemicalconceptembeddingsreconstructphysicalproteininteractionnetworks
AT kokaljenja pubmedscalechemicalconceptembeddingsreconstructphysicalproteininteractionnetworks
AT lavracnada pubmedscalechemicalconceptembeddingsreconstructphysicalproteininteractionnetworks