Cargando…

Mining physical protein-protein interactions from the literature

BACKGROUND: Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to inter...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Minlie, Ding, Shilin, Wang, Hongning, Zhu, Xiaoyan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559983/
https://www.ncbi.nlm.nih.gov/pubmed/18834490
http://dx.doi.org/10.1186/gb-2008-9-s2-s12
_version_ 1782159691562876928
author Huang, Minlie
Ding, Shilin
Wang, Hongning
Zhu, Xiaoyan
author_facet Huang, Minlie
Ding, Shilin
Wang, Hongning
Zhu, Xiaoyan
author_sort Huang, Minlie
collection PubMed
description BACKGROUND: Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches. RESULTS: During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F(1 )score of28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F(1 )score = 30.40%) and on the entire dataset (30.96%, 29.35%, and26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system. CONCLUSION: We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions.
format Text
id pubmed-2559983
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25599832008-10-04 Mining physical protein-protein interactions from the literature Huang, Minlie Ding, Shilin Wang, Hongning Zhu, Xiaoyan Genome Biol Research BACKGROUND: Deciphering physical protein-protein interactions is fundamental to elucidating both the functions of proteins and biological processes. The development of high-throughput experimental technologies such as the yeast two-hybrid screening has produced an explosion in data relating to interactions. Since manual curation is intensive in terms of time and cost, there is an urgent need for text-mining tools to facilitate the extraction of such information. The BioCreative (Critical Assessment of Information Extraction systems in Biology) challenge evaluation provided common standards and shared evaluation criteria to enable comparisons among different approaches. RESULTS: During the benchmark evaluation of BioCreative 2006, all of our results ranked in the top three places. In the task of filtering articles irrelevant to physical protein interactions, our method contributes a precision of 75.07%, a recall of 81.07%, and an AUC (area under the receiver operating characteristic curve) of 0.847. In the task of identifying protein mentions and normalizing mentions to molecule identifiers, our method is competitive among runs submitted, with a precision of 34.83%, a recall of 24.10%, and an F(1 )score of28.5%. In extracting protein interaction pairs, our profile-based method was competitive on the SwissProt-only subset (precision = 36.95%, recall = 32.68%, and F(1 )score = 30.40%) and on the entire dataset (30.96%, 29.35%, and26.20%, respectively). From the biologist's point of view, however, these findings are far from satisfactory. The error analysis presented in this report provides insight into how performance could be improved: three-quarters of false negatives were due to protein normalization problems (532/698), and about one-quarter were due to problems with correctly extracting interactions for this system. CONCLUSION: We present a text-mining framework to extract physical protein-protein interactions from the literature. Three key issues are addressed, namely filtering irrelevant articles, identifying protein names and normalizing them to molecule identifiers, and extracting protein-protein interactions. Our system is among the top three performers in the benchmark evaluation of BioCreative 2006. The tool will be helpful for manual interaction curation and can greatly facilitate the process of extracting protein-protein interactions. BioMed Central 2008 2008-09-01 /pmc/articles/PMC2559983/ /pubmed/18834490 http://dx.doi.org/10.1186/gb-2008-9-s2-s12 Text en Copyright © 2008 Huang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Huang, Minlie
Ding, Shilin
Wang, Hongning
Zhu, Xiaoyan
Mining physical protein-protein interactions from the literature
title Mining physical protein-protein interactions from the literature
title_full Mining physical protein-protein interactions from the literature
title_fullStr Mining physical protein-protein interactions from the literature
title_full_unstemmed Mining physical protein-protein interactions from the literature
title_short Mining physical protein-protein interactions from the literature
title_sort mining physical protein-protein interactions from the literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559983/
https://www.ncbi.nlm.nih.gov/pubmed/18834490
http://dx.doi.org/10.1186/gb-2008-9-s2-s12
work_keys_str_mv AT huangminlie miningphysicalproteinproteininteractionsfromtheliterature
AT dingshilin miningphysicalproteinproteininteractionsfromtheliterature
AT wanghongning miningphysicalproteinproteininteractionsfromtheliterature
AT zhuxiaoyan miningphysicalproteinproteininteractionsfromtheliterature