Cargando…
Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque
Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9463154/ https://www.ncbi.nlm.nih.gov/pubmed/36085310 http://dx.doi.org/10.1038/s41467-022-33026-0 |
_version_ | 1784787336652914688 |
---|---|
author | Fernández-Torras, Adrià Duran-Frigola, Miquel Bertoni, Martino Locatelli, Martina Aloy, Patrick |
author_facet | Fernández-Torras, Adrià Duran-Frigola, Miquel Bertoni, Martino Locatelli, Martina Aloy, Patrick |
author_sort | Fernández-Torras, Adrià |
collection | PubMed |
description | Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical descriptors derived from a gigantic knowledge graph, displaying more than 450 thousand biological entities and 30 million relationships between them. The Bioteque integrates, harmonizes, and formats data collected from over 150 data sources, including 12 biological entities (e.g., genes, diseases, drugs) linked by 67 types of associations (e.g., ‘drug treats disease’, ‘gene interacts with gene’). We show how Bioteque descriptors facilitate the assessment of high-throughput protein-protein interactome data, the prediction of drug response and new repurposing opportunities, and demonstrate that they can be used off-the-shelf in downstream machine learning tasks without loss of performance with respect to using original data. The Bioteque thus offers a thoroughly processed, tractable, and highly optimized assembly of the biomedical knowledge available in the public domain. |
format | Online Article Text |
id | pubmed-9463154 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-94631542022-09-11 Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque Fernández-Torras, Adrià Duran-Frigola, Miquel Bertoni, Martino Locatelli, Martina Aloy, Patrick Nat Commun Article Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical descriptors derived from a gigantic knowledge graph, displaying more than 450 thousand biological entities and 30 million relationships between them. The Bioteque integrates, harmonizes, and formats data collected from over 150 data sources, including 12 biological entities (e.g., genes, diseases, drugs) linked by 67 types of associations (e.g., ‘drug treats disease’, ‘gene interacts with gene’). We show how Bioteque descriptors facilitate the assessment of high-throughput protein-protein interactome data, the prediction of drug response and new repurposing opportunities, and demonstrate that they can be used off-the-shelf in downstream machine learning tasks without loss of performance with respect to using original data. The Bioteque thus offers a thoroughly processed, tractable, and highly optimized assembly of the biomedical knowledge available in the public domain. Nature Publishing Group UK 2022-09-09 /pmc/articles/PMC9463154/ /pubmed/36085310 http://dx.doi.org/10.1038/s41467-022-33026-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Fernández-Torras, Adrià Duran-Frigola, Miquel Bertoni, Martino Locatelli, Martina Aloy, Patrick Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque |
title | Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque |
title_full | Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque |
title_fullStr | Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque |
title_full_unstemmed | Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque |
title_short | Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque |
title_sort | integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the bioteque |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9463154/ https://www.ncbi.nlm.nih.gov/pubmed/36085310 http://dx.doi.org/10.1038/s41467-022-33026-0 |
work_keys_str_mv | AT fernandeztorrasadria integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque AT duranfrigolamiquel integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque AT bertonimartino integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque AT locatellimartina integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque AT aloypatrick integratingandformattingbiomedicaldataasprecalculatedknowledgegraphembeddingsinthebioteque |