Cargando…

BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph

There has been a large growth in the number of biomedical publications that report experimental results. Many of these results concern detection of protein–protein interactions (PPI). In BioCreative V, we participated in the BioC task and developed a PPI system to detect text passages with PPIs in t...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Yifan, Arighi, Cecilia, Wu, Cathy H., Vijay-Shanker, K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915133/
https://www.ncbi.nlm.nih.gov/pubmed/27170286
http://dx.doi.org/10.1093/database/baw072
_version_ 1782438651228061696
author Peng, Yifan
Arighi, Cecilia
Wu, Cathy H.
Vijay-Shanker, K.
author_facet Peng, Yifan
Arighi, Cecilia
Wu, Cathy H.
Vijay-Shanker, K.
author_sort Peng, Yifan
collection PubMed
description There has been a large growth in the number of biomedical publications that report experimental results. Many of these results concern detection of protein–protein interactions (PPI). In BioCreative V, we participated in the BioC task and developed a PPI system to detect text passages with PPIs in the full-text articles. By adopting the BioC format, the output of the system can be seamlessly added to the biocuration pipeline with little effort required for the system integration. A distinctive feature of our PPI system is that it utilizes extended dependency graph, an intermediate level of representation that attempts to abstract away syntactic variations in text. As a result, we are able to use only a limited set of rules to extract PPI pairs in the sentences, and additional rules to detect additional passages for PPI pairs. For evaluation, we used the 95 articles that were provided for the BioC annotation task. We retrieved the unique PPIs from the BioGRID database for these articles and show that our system achieves a recall of 83.5%. In order to evaluate the detection of passages with PPIs, we further annotated Abstract and Results sections of 20 documents from the dataset and show that an f-value of 80.5% was obtained. To evaluate the generalizability of the system, we also conducted experiments on AIMed, a well-known PPI corpus. We achieved an f-value of 76.1% for sentence detection and an f-value of 64.7% for unique PPI detection. Database URL: http://proteininformationresource.org/iprolink/corpora
format Online
Article
Text
id pubmed-4915133
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49151332016-06-22 BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph Peng, Yifan Arighi, Cecilia Wu, Cathy H. Vijay-Shanker, K. Database (Oxford) Original Article There has been a large growth in the number of biomedical publications that report experimental results. Many of these results concern detection of protein–protein interactions (PPI). In BioCreative V, we participated in the BioC task and developed a PPI system to detect text passages with PPIs in the full-text articles. By adopting the BioC format, the output of the system can be seamlessly added to the biocuration pipeline with little effort required for the system integration. A distinctive feature of our PPI system is that it utilizes extended dependency graph, an intermediate level of representation that attempts to abstract away syntactic variations in text. As a result, we are able to use only a limited set of rules to extract PPI pairs in the sentences, and additional rules to detect additional passages for PPI pairs. For evaluation, we used the 95 articles that were provided for the BioC annotation task. We retrieved the unique PPIs from the BioGRID database for these articles and show that our system achieves a recall of 83.5%. In order to evaluate the detection of passages with PPIs, we further annotated Abstract and Results sections of 20 documents from the dataset and show that an f-value of 80.5% was obtained. To evaluate the generalizability of the system, we also conducted experiments on AIMed, a well-known PPI corpus. We achieved an f-value of 76.1% for sentence detection and an f-value of 64.7% for unique PPI detection. Database URL: http://proteininformationresource.org/iprolink/corpora Oxford University Press 2016-05-11 /pmc/articles/PMC4915133/ /pubmed/27170286 http://dx.doi.org/10.1093/database/baw072 Text en © The Author(s) 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Peng, Yifan
Arighi, Cecilia
Wu, Cathy H.
Vijay-Shanker, K.
BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph
title BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph
title_full BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph
title_fullStr BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph
title_full_unstemmed BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph
title_short BioC-compatible full-text passage detection for protein–protein interactions using extended dependency graph
title_sort bioc-compatible full-text passage detection for protein–protein interactions using extended dependency graph
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915133/
https://www.ncbi.nlm.nih.gov/pubmed/27170286
http://dx.doi.org/10.1093/database/baw072
work_keys_str_mv AT pengyifan bioccompatiblefulltextpassagedetectionforproteinproteininteractionsusingextendeddependencygraph
AT arighicecilia bioccompatiblefulltextpassagedetectionforproteinproteininteractionsusingextendeddependencygraph
AT wucathyh bioccompatiblefulltextpassagedetectionforproteinproteininteractionsusingextendeddependencygraph
AT vijayshankerk bioccompatiblefulltextpassagedetectionforproteinproteininteractionsusingextendeddependencygraph