Cargando…
Overview of the protein-protein interaction annotation extraction task of BioCreative II
BACKGROUND: The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559988/ https://www.ncbi.nlm.nih.gov/pubmed/18834495 http://dx.doi.org/10.1186/gb-2008-9-s2-s4 |
_version_ | 1782159692739379200 |
---|---|
author | Krallinger, Martin Leitner, Florian Rodriguez-Penagos, Carlos Valencia, Alfonso |
author_facet | Krallinger, Martin Leitner, Florian Rodriguez-Penagos, Carlos Valencia, Alfonso |
author_sort | Krallinger, Martin |
collection | PubMed |
description | BACKGROUND: The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with manual curation was missing. RESULTS: We designed a community challenge, the BioCreative II protein-protein interaction (PPI) task, based on the main steps of a manual protein interaction annotation workflow. It was structured into four distinct subtasks related to: (a) detection of protein interaction-relevant articles; (b) extraction and normalization of protein interaction pairs; (c) retrieval of the interaction detection methods used; and (d) retrieval of actual text passages that provide evidence for protein interactions. A total of 26 teams submitted runs for at least one of the proposed subtasks. In the interaction article detection subtask, the top scoring team reached an F-score of 0.78. In the interaction pair extraction and mapping to SwissProt, a precision of 0.37 (with recall of 0.33) was obtained. For associating articles with an experimental interaction detection method, an F-score of 0.65 was achieved. As for the retrieval of the PPI passages best summarizing a given protein interaction in full-text articles, 19% of the submissions returned by one of the runs corresponded to curator-selected sentences. Curators extracted only the passages that best summarized a given interaction, implying that many of the automatically extracted ones could contain interaction information but did not correspond to the most informative sentences. CONCLUSION: The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline. The challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records. Some limitations were also encountered when using a single (and possibly incomplete) reference database for protein normalization or when limiting search for interactor proteins to co-occurrence within a single sentence, when a mention might span neighboring sentences. Finally, distinguishing between novel, experimentally verified interactions (annotation relevant) and previously known interactions adds additional complexity to these tasks. |
format | Text |
id | pubmed-2559988 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25599882008-10-04 Overview of the protein-protein interaction annotation extraction task of BioCreative II Krallinger, Martin Leitner, Florian Rodriguez-Penagos, Carlos Valencia, Alfonso Genome Biol Research BACKGROUND: The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with manual curation was missing. RESULTS: We designed a community challenge, the BioCreative II protein-protein interaction (PPI) task, based on the main steps of a manual protein interaction annotation workflow. It was structured into four distinct subtasks related to: (a) detection of protein interaction-relevant articles; (b) extraction and normalization of protein interaction pairs; (c) retrieval of the interaction detection methods used; and (d) retrieval of actual text passages that provide evidence for protein interactions. A total of 26 teams submitted runs for at least one of the proposed subtasks. In the interaction article detection subtask, the top scoring team reached an F-score of 0.78. In the interaction pair extraction and mapping to SwissProt, a precision of 0.37 (with recall of 0.33) was obtained. For associating articles with an experimental interaction detection method, an F-score of 0.65 was achieved. As for the retrieval of the PPI passages best summarizing a given protein interaction in full-text articles, 19% of the submissions returned by one of the runs corresponded to curator-selected sentences. Curators extracted only the passages that best summarized a given interaction, implying that many of the automatically extracted ones could contain interaction information but did not correspond to the most informative sentences. CONCLUSION: The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline. The challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records. Some limitations were also encountered when using a single (and possibly incomplete) reference database for protein normalization or when limiting search for interactor proteins to co-occurrence within a single sentence, when a mention might span neighboring sentences. Finally, distinguishing between novel, experimentally verified interactions (annotation relevant) and previously known interactions adds additional complexity to these tasks. BioMed Central 2008 2008-09-01 /pmc/articles/PMC2559988/ /pubmed/18834495 http://dx.doi.org/10.1186/gb-2008-9-s2-s4 Text en Copyright © 2008 Krallinger et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Krallinger, Martin Leitner, Florian Rodriguez-Penagos, Carlos Valencia, Alfonso Overview of the protein-protein interaction annotation extraction task of BioCreative II |
title | Overview of the protein-protein interaction annotation extraction task of BioCreative II |
title_full | Overview of the protein-protein interaction annotation extraction task of BioCreative II |
title_fullStr | Overview of the protein-protein interaction annotation extraction task of BioCreative II |
title_full_unstemmed | Overview of the protein-protein interaction annotation extraction task of BioCreative II |
title_short | Overview of the protein-protein interaction annotation extraction task of BioCreative II |
title_sort | overview of the protein-protein interaction annotation extraction task of biocreative ii |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559988/ https://www.ncbi.nlm.nih.gov/pubmed/18834495 http://dx.doi.org/10.1186/gb-2008-9-s2-s4 |
work_keys_str_mv | AT krallingermartin overviewoftheproteinproteininteractionannotationextractiontaskofbiocreativeii AT leitnerflorian overviewoftheproteinproteininteractionannotationextractiontaskofbiocreativeii AT rodriguezpenagoscarlos overviewoftheproteinproteininteractionannotationextractiontaskofbiocreativeii AT valenciaalfonso overviewoftheproteinproteininteractionannotationextractiontaskofbiocreativeii |