Cargando…

Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features

BACKGROUND: Protein-protein interaction (PPI) extraction from published scientific articles is one key issue in biological research due to its importance in grasping biological processes. Despite considerable advances of recent research in automatic PPI extraction from articles, demand remains to en...

Descripción completa

Detalles Bibliográficos
Autores principales: Thuy Phan, Thi Thanh, Ohkawa, Takenao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965725/
https://www.ncbi.nlm.nih.gov/pubmed/27454611
http://dx.doi.org/10.1186/s12859-016-1100-z
_version_ 1782445302945415168
author Thuy Phan, Thi Thanh
Ohkawa, Takenao
author_facet Thuy Phan, Thi Thanh
Ohkawa, Takenao
author_sort Thuy Phan, Thi Thanh
collection PubMed
description BACKGROUND: Protein-protein interaction (PPI) extraction from published scientific articles is one key issue in biological research due to its importance in grasping biological processes. Despite considerable advances of recent research in automatic PPI extraction from articles, demand remains to enhance the performance of the existing methods. RESULTS: Our feature-based method incorporates the strength of many kinds of diverse features, such as lexical and word context features derived from sentences, syntactic features derived from parse trees, and features using existing patterns to extract PPIs automatically from articles. Among these abundant features, we assemble the related features into four groups and define the contribution level (CL) for each group, which consists of related features. Our method consists of two steps. First, we divide the training set into subsets based on the structure of the sentence and the existence of significant keywords (SKs) and apply the sentence patterns given in advance to each subset. Second, we automatically perform feature selection based on the CL values of the four groups that consist of related features and the k-nearest neighbor algorithm (k-NN) through three approaches: (1) focusing on the group with the best contribution level (BEST1G); (2) unoptimized combination of three groups with the best contribution levels (U3G); (3) optimized combination of two groups with the best contribution levels (O2G). CONCLUSIONS: Our method outperforms other state-of-the-art PPI extraction systems in terms of F-score on the HPRD50 corpus and achieves promising results that are comparable with these PPI extraction systems on other corpora. Further, our method always obtains the best F-score on all the corpora than when using k-NN only without exploiting the CLs of the groups of related features.
format Online
Article
Text
id pubmed-4965725
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49657252016-08-02 Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features Thuy Phan, Thi Thanh Ohkawa, Takenao BMC Bioinformatics Research BACKGROUND: Protein-protein interaction (PPI) extraction from published scientific articles is one key issue in biological research due to its importance in grasping biological processes. Despite considerable advances of recent research in automatic PPI extraction from articles, demand remains to enhance the performance of the existing methods. RESULTS: Our feature-based method incorporates the strength of many kinds of diverse features, such as lexical and word context features derived from sentences, syntactic features derived from parse trees, and features using existing patterns to extract PPIs automatically from articles. Among these abundant features, we assemble the related features into four groups and define the contribution level (CL) for each group, which consists of related features. Our method consists of two steps. First, we divide the training set into subsets based on the structure of the sentence and the existence of significant keywords (SKs) and apply the sentence patterns given in advance to each subset. Second, we automatically perform feature selection based on the CL values of the four groups that consist of related features and the k-nearest neighbor algorithm (k-NN) through three approaches: (1) focusing on the group with the best contribution level (BEST1G); (2) unoptimized combination of three groups with the best contribution levels (U3G); (3) optimized combination of two groups with the best contribution levels (O2G). CONCLUSIONS: Our method outperforms other state-of-the-art PPI extraction systems in terms of F-score on the HPRD50 corpus and achieves promising results that are comparable with these PPI extraction systems on other corpora. Further, our method always obtains the best F-score on all the corpora than when using k-NN only without exploiting the CLs of the groups of related features. BioMed Central 2016-07-25 /pmc/articles/PMC4965725/ /pubmed/27454611 http://dx.doi.org/10.1186/s12859-016-1100-z Text en © Phan and Ohkawa. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Thuy Phan, Thi Thanh
Ohkawa, Takenao
Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
title Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
title_full Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
title_fullStr Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
title_full_unstemmed Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
title_short Protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
title_sort protein-protein interaction extraction with feature selection by evaluating contribution levels of groups consisting of related features
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4965725/
https://www.ncbi.nlm.nih.gov/pubmed/27454611
http://dx.doi.org/10.1186/s12859-016-1100-z
work_keys_str_mv AT thuyphanthithanh proteinproteininteractionextractionwithfeatureselectionbyevaluatingcontributionlevelsofgroupsconsistingofrelatedfeatures
AT ohkawatakenao proteinproteininteractionextractionwithfeatureselectionbyevaluatingcontributionlevelsofgroupsconsistingofrelatedfeatures