Cargando…
Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
BACKGROUND: Protein-protein interaction (PPI) extraction from published scientific articles is one key issue in biological research due to its importance in grasping biological processes. Despite considerable advances of recent research in automatic PPI extraction from articles, demand remains to en...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965725/ https://www.ncbi.nlm.nih.gov/pubmed/27454611 http://dx.doi.org/10.1186/s12859-016-1100-z |
_version_ | 1782445302945415168 |
---|---|
author | Thuy Phan, Thi Thanh Ohkawa, Takenao |
author_facet | Thuy Phan, Thi Thanh Ohkawa, Takenao |
author_sort | Thuy Phan, Thi Thanh |
collection | PubMed |
description | BACKGROUND: Protein-protein interaction (PPI) extraction from published scientific articles is one key issue in biological research due to its importance in grasping biological processes. Despite considerable advances of recent research in automatic PPI extraction from articles, demand remains to enhance the performance of the existing methods. RESULTS: Our feature-based method incorporates the strength of many kinds of diverse features, such as lexical and word context features derived from sentences, syntactic features derived from parse trees, and features using existing patterns to extract PPIs automatically from articles. Among these abundant features, we assemble the related features into four groups and define the contribution level (CL) for each group, which consists of related features. Our method consists of two steps. First, we divide the training set into subsets based on the structure of the sentence and the existence of significant keywords (SKs) and apply the sentence patterns given in advance to each subset. Second, we automatically perform feature selection based on the CL values of the four groups that consist of related features and the k-nearest neighbor algorithm (k-NN) through three approaches: (1) focusing on the group with the best contribution level (BEST1G); (2) unoptimized combination of three groups with the best contribution levels (U3G); (3) optimized combination of two groups with the best contribution levels (O2G). CONCLUSIONS: Our method outperforms other state-of-the-art PPI extraction systems in terms of F-score on the HPRD50 corpus and achieves promising results that are comparable with these PPI extraction systems on other corpora. Further, our method always obtains the best F-score on all the corpora than when using k-NN only without exploiting the CLs of the groups of related features. |
format | Online Article Text |
id | pubmed-4965725 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49657252016-08-02 Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features Thuy Phan, Thi Thanh Ohkawa, Takenao BMC Bioinformatics Research BACKGROUND: Protein-protein interaction (PPI) extraction from published scientific articles is one key issue in biological research due to its importance in grasping biological processes. Despite considerable advances of recent research in automatic PPI extraction from articles, demand remains to enhance the performance of the existing methods. RESULTS: Our feature-based method incorporates the strength of many kinds of diverse features, such as lexical and word context features derived from sentences, syntactic features derived from parse trees, and features using existing patterns to extract PPIs automatically from articles. Among these abundant features, we assemble the related features into four groups and define the contribution level (CL) for each group, which consists of related features. Our method consists of two steps. First, we divide the training set into subsets based on the structure of the sentence and the existence of significant keywords (SKs) and apply the sentence patterns given in advance to each subset. Second, we automatically perform feature selection based on the CL values of the four groups that consist of related features and the k-nearest neighbor algorithm (k-NN) through three approaches: (1) focusing on the group with the best contribution level (BEST1G); (2) unoptimized combination of three groups with the best contribution levels (U3G); (3) optimized combination of two groups with the best contribution levels (O2G). CONCLUSIONS: Our method outperforms other state-of-the-art PPI extraction systems in terms of F-score on the HPRD50 corpus and achieves promising results that are comparable with these PPI extraction systems on other corpora. Further, our method always obtains the best F-score on all the corpora than when using k-NN only without exploiting the CLs of the groups of related features. BioMed Central 2016-07-25 /pmc/articles/PMC4965725/ /pubmed/27454611 http://dx.doi.org/10.1186/s12859-016-1100-z Text en © Phan and Ohkawa. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Thuy Phan, Thi Thanh Ohkawa, Takenao Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features |
title | Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features |
title_full | Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features |
title_fullStr | Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features |
title_full_unstemmed | Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features |
title_short | Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features |
title_sort | protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965725/ https://www.ncbi.nlm.nih.gov/pubmed/27454611 http://dx.doi.org/10.1186/s12859-016-1100-z |
work_keys_str_mv | AT thuyphanthithanh proteinproteininteractionextractionwithfeatureselectionbyevaluatingcontributionlevelsofgroupsconsistingofrelatedfeatures AT ohkawatakenao proteinproteininteractionextractionwithfeatureselectionbyevaluatingcontributionlevelsofgroupsconsistingofrelatedfeatures |