Cargando…

A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets

Most biomedical knowledge is published as text, making it challenging to analyse using traditional statistical methods. In contrast, machine-interpretable data primarily comes from structured property databases, which represent only a fraction of the knowledge present in the biomedical literature. C...

Descripción completa

Detalles Bibliográficos
Autores principales:	Narganes-Carlón, David, Crowther, Daniel J., Pearson, Ewan R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10209167/ https://www.ncbi.nlm.nih.gov/pubmed/37225853 http://dx.doi.org/10.1038/s41598-023-35597-4

_version_	1785046819356540928
author	Narganes-Carlón, David Crowther, Daniel J. Pearson, Ewan R.
author_facet	Narganes-Carlón, David Crowther, Daniel J. Pearson, Ewan R.
author_sort	Narganes-Carlón, David
collection	PubMed
description	Most biomedical knowledge is published as text, making it challenging to analyse using traditional statistical methods. In contrast, machine-interpretable data primarily comes from structured property databases, which represent only a fraction of the knowledge present in the biomedical literature. Crucial insights and inferences can be drawn from these publications by the scientific community. We trained language models on literature from different time periods to evaluate their ranking of prospective gene-disease associations and protein–protein interactions. Using 28 distinct historical text corpora of abstracts published between 1995 and 2022, we trained independent Word2Vec models to prioritise associations that were likely to be reported in future years. This study demonstrates that biomedical knowledge can be encoded as word embeddings without the need for human labelling or supervision. Language models effectively capture drug discovery concepts such as clinical tractability, disease associations, and biochemical pathways. Additionally, these models can prioritise hypotheses years before their initial reporting. Our findings underscore the potential for extracting yet-to-be-discovered relationships through data-driven approaches, leading to generalised biomedical literature mining for potential therapeutic drug targets. The Publication-Wide Association Study (PWAS) enables the prioritisation of under-explored targets and provides a scalable system for accelerating early-stage target ranking, irrespective of the specific disease of interest.
format	Online Article Text
id	pubmed-10209167
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-102091672023-05-26 A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets Narganes-Carlón, David Crowther, Daniel J. Pearson, Ewan R. Sci Rep Article Most biomedical knowledge is published as text, making it challenging to analyse using traditional statistical methods. In contrast, machine-interpretable data primarily comes from structured property databases, which represent only a fraction of the knowledge present in the biomedical literature. Crucial insights and inferences can be drawn from these publications by the scientific community. We trained language models on literature from different time periods to evaluate their ranking of prospective gene-disease associations and protein–protein interactions. Using 28 distinct historical text corpora of abstracts published between 1995 and 2022, we trained independent Word2Vec models to prioritise associations that were likely to be reported in future years. This study demonstrates that biomedical knowledge can be encoded as word embeddings without the need for human labelling or supervision. Language models effectively capture drug discovery concepts such as clinical tractability, disease associations, and biochemical pathways. Additionally, these models can prioritise hypotheses years before their initial reporting. Our findings underscore the potential for extracting yet-to-be-discovered relationships through data-driven approaches, leading to generalised biomedical literature mining for potential therapeutic drug targets. The Publication-Wide Association Study (PWAS) enables the prioritisation of under-explored targets and provides a scalable system for accelerating early-stage target ranking, irrespective of the specific disease of interest. Nature Publishing Group UK 2023-05-24 /pmc/articles/PMC10209167/ /pubmed/37225853 http://dx.doi.org/10.1038/s41598-023-35597-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Narganes-Carlón, David Crowther, Daniel J. Pearson, Ewan R. A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets
title	A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets
title_full	A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets
title_fullStr	A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets
title_full_unstemmed	A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets
title_short	A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets
title_sort	publication-wide association study (pwas), historical language models to prioritise novel therapeutic drug targets
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10209167/ https://www.ncbi.nlm.nih.gov/pubmed/37225853 http://dx.doi.org/10.1038/s41598-023-35597-4
work_keys_str_mv	AT narganescarlondavid apublicationwideassociationstudypwashistoricallanguagemodelstoprioritisenoveltherapeuticdrugtargets AT crowtherdanielj apublicationwideassociationstudypwashistoricallanguagemodelstoprioritisenoveltherapeuticdrugtargets AT pearsonewanr apublicationwideassociationstudypwashistoricallanguagemodelstoprioritisenoveltherapeuticdrugtargets AT narganescarlondavid publicationwideassociationstudypwashistoricallanguagemodelstoprioritisenoveltherapeuticdrugtargets AT crowtherdanielj publicationwideassociationstudypwashistoricallanguagemodelstoprioritisenoveltherapeuticdrugtargets AT pearsonewanr publicationwideassociationstudypwashistoricallanguagemodelstoprioritisenoveltherapeuticdrugtargets

A publication-wide association study (PWAS), historical language models to prioritise novel therapeutic drug targets

Ejemplares similares