Cargando…

Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque

Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated...

Descripción completa

Detalles Bibliográficos
Autores principales: Fernández-Torras, Adrià, Duran-Frigola, Miquel, Bertoni, Martino, Locatelli, Martina, Aloy, Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9463154/
https://www.ncbi.nlm.nih.gov/pubmed/36085310
http://dx.doi.org/10.1038/s41467-022-33026-0
_version_ 1784787336652914688
author Fernández-Torras, Adrià
Duran-Frigola, Miquel
Bertoni, Martino
Locatelli, Martina
Aloy, Patrick
author_facet Fernández-Torras, Adrià
Duran-Frigola, Miquel
Bertoni, Martino
Locatelli, Martina
Aloy, Patrick
author_sort Fernández-Torras, Adrià
collection PubMed
description Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical descriptors derived from a gigantic knowledge graph, displaying more than 450 thousand biological entities and 30 million relationships between them. The Bioteque integrates, harmonizes, and formats data collected from over 150 data sources, including 12 biological entities (e.g., genes, diseases, drugs) linked by 67 types of associations (e.g., ‘drug treats disease’, ‘gene interacts with gene’). We show how Bioteque descriptors facilitate the assessment of high-throughput protein-protein interactome data, the prediction of drug response and new repurposing opportunities, and demonstrate that they can be used off-the-shelf in downstream machine learning tasks without loss of performance with respect to using original data. The Bioteque thus offers a thoroughly processed, tractable, and highly optimized assembly of the biomedical knowledge available in the public domain.
format Online
Article
Text
id pubmed-9463154
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-94631542022-09-11 Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque Fernández-Torras, Adrià Duran-Frigola, Miquel Bertoni, Martino Locatelli, Martina Aloy, Patrick Nat Commun Article Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical descriptors derived from a gigantic knowledge graph, displaying more than 450 thousand biological entities and 30 million relationships between them. The Bioteque integrates, harmonizes, and formats data collected from over 150 data sources, including 12 biological entities (e.g., genes, diseases, drugs) linked by 67 types of associations (e.g., ‘drug treats disease’, ‘gene interacts with gene’). We show how Bioteque descriptors facilitate the assessment of high-throughput protein-protein interactome data, the prediction of drug response and new repurposing opportunities, and demonstrate that they can be used off-the-shelf in downstream machine learning tasks without loss of performance with respect to using original data. The Bioteque thus offers a thoroughly processed, tractable, and highly optimized assembly of the biomedical knowledge available in the public domain. Nature Publishing Group UK 2022-09-09 /pmc/articles/PMC9463154/ /pubmed/36085310 http://dx.doi.org/10.1038/s41467-022-33026-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Fernández-Torras, Adrià
Duran-Frigola, Miquel
Bertoni, Martino
Locatelli, Martina
Aloy, Patrick
Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque
title Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque
title_full Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque
title_fullStr Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque
title_full_unstemmed Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque
title_short Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque
title_sort integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the bioteque
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9463154/
https://www.ncbi.nlm.nih.gov/pubmed/36085310
http://dx.doi.org/10.1038/s41467-022-33026-0
work_keys_str_mv AT fernandeztorrasadria integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque
AT duranfrigolamiquel integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque
AT bertonimartino integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque
AT locatellimartina integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque
AT aloypatrick integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque