Cargando…

Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature

BACKGROUND: The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Articl...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xinglong, Rak, Rafal, Restificar, Angelo, Nobata, Chikashi, Rupp, CJ, Batista-Navarro, Riza Theresa B, Nawaz, Raheel, Ananiadou, Sophia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269934/
https://www.ncbi.nlm.nih.gov/pubmed/22151769
http://dx.doi.org/10.1186/1471-2105-12-S8-S11
_version_ 1782222522356334592
author Wang, Xinglong
Rak, Rafal
Restificar, Angelo
Nobata, Chikashi
Rupp, CJ
Batista-Navarro, Riza Theresa B
Nawaz, Raheel
Ananiadou, Sophia
author_facet Wang, Xinglong
Rak, Rafal
Restificar, Angelo
Nobata, Chikashi
Rupp, CJ
Batista-Navarro, Riza Theresa B
Nawaz, Raheel
Ananiadou, Sophia
author_sort Wang, Xinglong
collection PubMed
description BACKGROUND: The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. RESULTS: We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. CONCLUSIONS: Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.
format Online
Article
Text
id pubmed-3269934
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32699342012-02-02 Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature Wang, Xinglong Rak, Rafal Restificar, Angelo Nobata, Chikashi Rupp, CJ Batista-Navarro, Riza Theresa B Nawaz, Raheel Ananiadou, Sophia BMC Bioinformatics Research BACKGROUND: The selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles. RESULTS: We proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics. CONCLUSIONS: Our novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance. BioMed Central 2011-10-03 /pmc/articles/PMC3269934/ /pubmed/22151769 http://dx.doi.org/10.1186/1471-2105-12-S8-S11 Text en Copyright ©2011 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Wang, Xinglong
Rak, Rafal
Restificar, Angelo
Nobata, Chikashi
Rupp, CJ
Batista-Navarro, Riza Theresa B
Nawaz, Raheel
Ananiadou, Sophia
Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature
title Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature
title_full Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature
title_fullStr Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature
title_full_unstemmed Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature
title_short Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature
title_sort detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269934/
https://www.ncbi.nlm.nih.gov/pubmed/22151769
http://dx.doi.org/10.1186/1471-2105-12-S8-S11
work_keys_str_mv AT wangxinglong detectingexperimentaltechniquesandselectingrelevantdocumentsforproteinproteininteractionsfrombiomedicalliterature
AT rakrafal detectingexperimentaltechniquesandselectingrelevantdocumentsforproteinproteininteractionsfrombiomedicalliterature
AT restificarangelo detectingexperimentaltechniquesandselectingrelevantdocumentsforproteinproteininteractionsfrombiomedicalliterature
AT nobatachikashi detectingexperimentaltechniquesandselectingrelevantdocumentsforproteinproteininteractionsfrombiomedicalliterature
AT ruppcj detectingexperimentaltechniquesandselectingrelevantdocumentsforproteinproteininteractionsfrombiomedicalliterature
AT batistanavarrorizatheresab detectingexperimentaltechniquesandselectingrelevantdocumentsforproteinproteininteractionsfrombiomedicalliterature
AT nawazraheel detectingexperimentaltechniquesandselectingrelevantdocumentsforproteinproteininteractionsfrombiomedicalliterature
AT ananiadousophia detectingexperimentaltechniquesandselectingrelevantdocumentsforproteinproteininteractionsfrombiomedicalliterature