Cargando…

Automated recognition of functional compound-protein relationships in literature

MOTIVATION: Much effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use ope...

Descripción completa

Detalles Bibliográficos
Autores principales: Döring, Kersten, Qaseem, Ammar, Becer, Michael, Li, Jianyu, Mishra, Pankaj, Gao, Mingjie, Kirchner, Pascal, Sauter, Florian, Telukunta, Kiran K., Moumbock, Aurélien F. A., Thomas, Philippe, Günther, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7053725/
https://www.ncbi.nlm.nih.gov/pubmed/32126064
http://dx.doi.org/10.1371/journal.pone.0220925
_version_ 1783503093720154112
author Döring, Kersten
Qaseem, Ammar
Becer, Michael
Li, Jianyu
Mishra, Pankaj
Gao, Mingjie
Kirchner, Pascal
Sauter, Florian
Telukunta, Kiran K.
Moumbock, Aurélien F. A.
Thomas, Philippe
Günther, Stefan
author_facet Döring, Kersten
Qaseem, Ammar
Becer, Michael
Li, Jianyu
Mishra, Pankaj
Gao, Mingjie
Kirchner, Pascal
Sauter, Florian
Telukunta, Kiran K.
Moumbock, Aurélien F. A.
Thomas, Philippe
Günther, Stefan
author_sort Döring, Kersten
collection PubMed
description MOTIVATION: Much effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task. METHOD: We created a new benchmark dataset of 2,613 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated. RESULTS: The cross-validation of the all-paths graph kernel (AUC value: 84.6%, F1 score: 79.0%) shows slightly better results than the shallow linguistic kernel (AUC value: 82.5%, F1 score: 77.2%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance slightly. We used each of the two kernels to identify functional relationships in all PubMed abstracts (29 million) and provide the results, including recorded processing time. AVAILABILITY: The software for the tested kernels, the benchmark, the processed 29 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline.
format Online
Article
Text
id pubmed-7053725
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-70537252020-03-12 Automated recognition of functional compound-protein relationships in literature Döring, Kersten Qaseem, Ammar Becer, Michael Li, Jianyu Mishra, Pankaj Gao, Mingjie Kirchner, Pascal Sauter, Florian Telukunta, Kiran K. Moumbock, Aurélien F. A. Thomas, Philippe Günther, Stefan PLoS One Research Article MOTIVATION: Much effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task. METHOD: We created a new benchmark dataset of 2,613 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated. RESULTS: The cross-validation of the all-paths graph kernel (AUC value: 84.6%, F1 score: 79.0%) shows slightly better results than the shallow linguistic kernel (AUC value: 82.5%, F1 score: 77.2%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance slightly. We used each of the two kernels to identify functional relationships in all PubMed abstracts (29 million) and provide the results, including recorded processing time. AVAILABILITY: The software for the tested kernels, the benchmark, the processed 29 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline. Public Library of Science 2020-03-03 /pmc/articles/PMC7053725/ /pubmed/32126064 http://dx.doi.org/10.1371/journal.pone.0220925 Text en © 2020 Döring et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Döring, Kersten
Qaseem, Ammar
Becer, Michael
Li, Jianyu
Mishra, Pankaj
Gao, Mingjie
Kirchner, Pascal
Sauter, Florian
Telukunta, Kiran K.
Moumbock, Aurélien F. A.
Thomas, Philippe
Günther, Stefan
Automated recognition of functional compound-protein relationships in literature
title Automated recognition of functional compound-protein relationships in literature
title_full Automated recognition of functional compound-protein relationships in literature
title_fullStr Automated recognition of functional compound-protein relationships in literature
title_full_unstemmed Automated recognition of functional compound-protein relationships in literature
title_short Automated recognition of functional compound-protein relationships in literature
title_sort automated recognition of functional compound-protein relationships in literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7053725/
https://www.ncbi.nlm.nih.gov/pubmed/32126064
http://dx.doi.org/10.1371/journal.pone.0220925
work_keys_str_mv AT doringkersten automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT qaseemammar automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT becermichael automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT lijianyu automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT mishrapankaj automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT gaomingjie automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT kirchnerpascal automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT sauterflorian automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT telukuntakirank automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT moumbockaurelienfa automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT thomasphilippe automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
AT guntherstefan automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature