Cargando…

TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery

Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage o...

Descripción completa

Detalles Bibliográficos
Autores principales: Serrano Nájera, Guillermo, Narganes Carlón, David, Crowther, Daniel J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8333311/
https://www.ncbi.nlm.nih.gov/pubmed/34344904
http://dx.doi.org/10.1038/s41598-021-94897-9
_version_ 1783732997881593856
author Serrano Nájera, Guillermo
Narganes Carlón, David
Crowther, Daniel J.
author_facet Serrano Nájera, Guillermo
Narganes Carlón, David
Crowther, Daniel J.
author_sort Serrano Nájera, Guillermo
collection PubMed
description Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage of human genes by omic-scale research make it difficult to maintain meaningful overviews from which to identify promising new trends. Here we propose an automated yet flexible pipeline that identifies trends in the scientific corpus which align with the specific interests of a researcher and facilitate an initial prioritisation of opportunities. Using a procedure based on co-citation networks and machine learning, genes and diseases are first parsed from PubMed articles using a novel named entity recognition system together with publication date and supporting information. Then recurrent neural networks are trained to predict the publication dynamics of all human genes. For a user-defined therapeutic focus, genes generating more publications or citations are identified as high-interest targets. We also used topic detection routines to help understand why a gene is trendy and implement a system to propose the most prominent review articles for a potential target. This TrendyGenes pipeline detects emerging targets and pathways and provides a new way to explore the literature for individual researchers, pharmaceutical companies and funding agencies.
format Online
Article
Text
id pubmed-8333311
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-83333112021-08-05 TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery Serrano Nájera, Guillermo Narganes Carlón, David Crowther, Daniel J. Sci Rep Article Target identification and prioritisation are prominent first steps in modern drug discovery. Traditionally, individual scientists have used their expertise to manually interpret scientific literature and prioritise opportunities. However, increasing publication rates and the wider routine coverage of human genes by omic-scale research make it difficult to maintain meaningful overviews from which to identify promising new trends. Here we propose an automated yet flexible pipeline that identifies trends in the scientific corpus which align with the specific interests of a researcher and facilitate an initial prioritisation of opportunities. Using a procedure based on co-citation networks and machine learning, genes and diseases are first parsed from PubMed articles using a novel named entity recognition system together with publication date and supporting information. Then recurrent neural networks are trained to predict the publication dynamics of all human genes. For a user-defined therapeutic focus, genes generating more publications or citations are identified as high-interest targets. We also used topic detection routines to help understand why a gene is trendy and implement a system to propose the most prominent review articles for a potential target. This TrendyGenes pipeline detects emerging targets and pathways and provides a new way to explore the literature for individual researchers, pharmaceutical companies and funding agencies. Nature Publishing Group UK 2021-08-03 /pmc/articles/PMC8333311/ /pubmed/34344904 http://dx.doi.org/10.1038/s41598-021-94897-9 Text en © The Author(s) 2021, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Serrano Nájera, Guillermo
Narganes Carlón, David
Crowther, Daniel J.
TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_full TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_fullStr TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_full_unstemmed TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_short TrendyGenes, a computational pipeline for the detection of literature trends in academia and drug discovery
title_sort trendygenes, a computational pipeline for the detection of literature trends in academia and drug discovery
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8333311/
https://www.ncbi.nlm.nih.gov/pubmed/34344904
http://dx.doi.org/10.1038/s41598-021-94897-9
work_keys_str_mv AT serranonajeraguillermo trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery
AT narganescarlondavid trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery
AT crowtherdanielj trendygenesacomputationalpipelineforthedetectionofliteraturetrendsinacademiaanddrugdiscovery