Cargando…

Automatic extraction of protein-protein interactions using grammatical relationship graph

BACKGROUND: Relationships between bio-entities (genes, proteins, diseases, etc.) constitute a significant part of our knowledge. Most of this information is documented as unstructured text in different forms, such as books, articles and on-line pages. Automatic extraction of such information and sto...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Kaixian, Lung, Pei-Yau, Zhao, Tingting, Zhao, Peixiang, Tseng, Yan-Yuan, Zhang, Jinfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069288/
https://www.ncbi.nlm.nih.gov/pubmed/30066644
http://dx.doi.org/10.1186/s12911-018-0628-4
_version_ 1783343457810513920
author Yu, Kaixian
Lung, Pei-Yau
Zhao, Tingting
Zhao, Peixiang
Tseng, Yan-Yuan
Zhang, Jinfeng
author_facet Yu, Kaixian
Lung, Pei-Yau
Zhao, Tingting
Zhao, Peixiang
Tseng, Yan-Yuan
Zhang, Jinfeng
author_sort Yu, Kaixian
collection PubMed
description BACKGROUND: Relationships between bio-entities (genes, proteins, diseases, etc.) constitute a significant part of our knowledge. Most of this information is documented as unstructured text in different forms, such as books, articles and on-line pages. Automatic extraction of such information and storing it in structured form could help researchers more easily access such information and also make it possible to incorporate it in advanced integrative analysis. In this study, we developed a novel approach to extract bio-entity relationships information using Nature Language Processing (NLP) and a graph-theoretic algorithm. METHODS: Our method, called GRGT (Grammatical Relationship Graph for Triplets), not only extracts the pairs of terms that have certain relationships, but also extracts the type of relationship (the word describing the relationships). In addition, the directionality of the relationship can also be extracted. Our method is based on the assumption that a triplet exists for a pair of interactions. A triplet is defined as two terms (entities) and an interaction word describing the relationship of the two terms in a sentence. We first use a sentence parsing tool to obtain the sentence structure represented as a dependency graph where words are nodes and edges are typed dependencies. The shortest paths among the pairs of words in the triplet are then extracted, which form the basis for our information extraction method. Flexible pattern matching scheme was then used to match a triplet graph with unknown relationship to those triplet graphs with labels (True or False) in the database. RESULTS: We applied the method on three benchmark datasets to extract the protein-protein-interactions (PPIs), and obtained better precision than the top performing methods in literature. CONCLUSIONS: We have developed a method to extract the protein-protein interactions from biomedical literature. PPIs extracted by our method have higher precision among other methods, suggesting that our method can be used to effectively extract PPIs and deposit them into databases. Beyond extracting PPIs, our method could be easily extended to extracting relationship information between other bio-entities.
format Online
Article
Text
id pubmed-6069288
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60692882018-08-03 Automatic extraction of protein-protein interactions using grammatical relationship graph Yu, Kaixian Lung, Pei-Yau Zhao, Tingting Zhao, Peixiang Tseng, Yan-Yuan Zhang, Jinfeng BMC Med Inform Decis Mak Research BACKGROUND: Relationships between bio-entities (genes, proteins, diseases, etc.) constitute a significant part of our knowledge. Most of this information is documented as unstructured text in different forms, such as books, articles and on-line pages. Automatic extraction of such information and storing it in structured form could help researchers more easily access such information and also make it possible to incorporate it in advanced integrative analysis. In this study, we developed a novel approach to extract bio-entity relationships information using Nature Language Processing (NLP) and a graph-theoretic algorithm. METHODS: Our method, called GRGT (Grammatical Relationship Graph for Triplets), not only extracts the pairs of terms that have certain relationships, but also extracts the type of relationship (the word describing the relationships). In addition, the directionality of the relationship can also be extracted. Our method is based on the assumption that a triplet exists for a pair of interactions. A triplet is defined as two terms (entities) and an interaction word describing the relationship of the two terms in a sentence. We first use a sentence parsing tool to obtain the sentence structure represented as a dependency graph where words are nodes and edges are typed dependencies. The shortest paths among the pairs of words in the triplet are then extracted, which form the basis for our information extraction method. Flexible pattern matching scheme was then used to match a triplet graph with unknown relationship to those triplet graphs with labels (True or False) in the database. RESULTS: We applied the method on three benchmark datasets to extract the protein-protein-interactions (PPIs), and obtained better precision than the top performing methods in literature. CONCLUSIONS: We have developed a method to extract the protein-protein interactions from biomedical literature. PPIs extracted by our method have higher precision among other methods, suggesting that our method can be used to effectively extract PPIs and deposit them into databases. Beyond extracting PPIs, our method could be easily extended to extracting relationship information between other bio-entities. BioMed Central 2018-07-23 /pmc/articles/PMC6069288/ /pubmed/30066644 http://dx.doi.org/10.1186/s12911-018-0628-4 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yu, Kaixian
Lung, Pei-Yau
Zhao, Tingting
Zhao, Peixiang
Tseng, Yan-Yuan
Zhang, Jinfeng
Automatic extraction of protein-protein interactions using grammatical relationship graph
title Automatic extraction of protein-protein interactions using grammatical relationship graph
title_full Automatic extraction of protein-protein interactions using grammatical relationship graph
title_fullStr Automatic extraction of protein-protein interactions using grammatical relationship graph
title_full_unstemmed Automatic extraction of protein-protein interactions using grammatical relationship graph
title_short Automatic extraction of protein-protein interactions using grammatical relationship graph
title_sort automatic extraction of protein-protein interactions using grammatical relationship graph
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069288/
https://www.ncbi.nlm.nih.gov/pubmed/30066644
http://dx.doi.org/10.1186/s12911-018-0628-4
work_keys_str_mv AT yukaixian automaticextractionofproteinproteininteractionsusinggrammaticalrelationshipgraph
AT lungpeiyau automaticextractionofproteinproteininteractionsusinggrammaticalrelationshipgraph
AT zhaotingting automaticextractionofproteinproteininteractionsusinggrammaticalrelationshipgraph
AT zhaopeixiang automaticextractionofproteinproteininteractionsusinggrammaticalrelationshipgraph
AT tsengyanyuan automaticextractionofproteinproteininteractionsusinggrammaticalrelationshipgraph
AT zhangjinfeng automaticextractionofproteinproteininteractionsusinggrammaticalrelationshipgraph