Cargando…

Classifying protein-protein interaction articles using word and syntactic features

BACKGROUND: Identifying protein-protein interactions (PPIs) from literature is an important step in mining the function of individual proteins as well as their biological network. Since it is known that PPIs have distinctive patterns in text, machine learning approaches have been successfully applie...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Sun, Wilbur, W John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269944/ https://www.ncbi.nlm.nih.gov/pubmed/22151252 http://dx.doi.org/10.1186/1471-2105-12-S8-S9

_version_	1782222524689416192
author	Kim, Sun Wilbur, W John
author_facet	Kim, Sun Wilbur, W John
author_sort	Kim, Sun
collection	PubMed
description	BACKGROUND: Identifying protein-protein interactions (PPIs) from literature is an important step in mining the function of individual proteins as well as their biological network. Since it is known that PPIs have distinctive patterns in text, machine learning approaches have been successfully applied to mine these patterns. However, the complex nature of PPI description makes the extraction process difficult. RESULTS: Our approach utilizes both word and syntactic features to effectively capture PPI patterns from biomedical literature. The proposed method automatically identifies gene names by a Priority Model, then extracts grammar relations using a dependency parser. A large margin classifier with Huber loss function learns from the extracted features, and unknown articles are predicted using this data-driven model. For the BioCreative III ACT evaluation, our official runs were ranked in top positions by obtaining maximum 89.15% accuracy, 61.42% F1 score, 0.55306 MCC score, and 67.98% AUC iP/R score. CONCLUSIONS: Even though problems still remain, utilizing syntactic information for article-level filtering helps improve PPI ranking performance. The proposed system is a revision of previously developed algorithms in our group for the ACT evaluation. Our approach is valuable in showing how to use grammatical relations for PPI article filtering, in particular, with a limited training corpus. While current performance is far from satisfactory as an annotation tool, it is already useful for a PPI article search engine since users are mainly focused on highly-ranked results.
format	Online Article Text
id	pubmed-3269944
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32699442012-02-02 Classifying protein-protein interaction articles using word and syntactic features Kim, Sun Wilbur, W John BMC Bioinformatics Research BACKGROUND: Identifying protein-protein interactions (PPIs) from literature is an important step in mining the function of individual proteins as well as their biological network. Since it is known that PPIs have distinctive patterns in text, machine learning approaches have been successfully applied to mine these patterns. However, the complex nature of PPI description makes the extraction process difficult. RESULTS: Our approach utilizes both word and syntactic features to effectively capture PPI patterns from biomedical literature. The proposed method automatically identifies gene names by a Priority Model, then extracts grammar relations using a dependency parser. A large margin classifier with Huber loss function learns from the extracted features, and unknown articles are predicted using this data-driven model. For the BioCreative III ACT evaluation, our official runs were ranked in top positions by obtaining maximum 89.15% accuracy, 61.42% F1 score, 0.55306 MCC score, and 67.98% AUC iP/R score. CONCLUSIONS: Even though problems still remain, utilizing syntactic information for article-level filtering helps improve PPI ranking performance. The proposed system is a revision of previously developed algorithms in our group for the ACT evaluation. Our approach is valuable in showing how to use grammatical relations for PPI article filtering, in particular, with a limited training corpus. While current performance is far from satisfactory as an annotation tool, it is already useful for a PPI article search engine since users are mainly focused on highly-ranked results. BioMed Central 2011-10-03 /pmc/articles/PMC3269944/ /pubmed/22151252 http://dx.doi.org/10.1186/1471-2105-12-S8-S9 Text en Copyright ©2011 Kim and Wilbur; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Kim, Sun Wilbur, W John Classifying protein-protein interaction articles using word and syntactic features
title	Classifying protein-protein interaction articles using word and syntactic features
title_full	Classifying protein-protein interaction articles using word and syntactic features
title_fullStr	Classifying protein-protein interaction articles using word and syntactic features
title_full_unstemmed	Classifying protein-protein interaction articles using word and syntactic features
title_short	Classifying protein-protein interaction articles using word and syntactic features
title_sort	classifying protein-protein interaction articles using word and syntactic features
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269944/ https://www.ncbi.nlm.nih.gov/pubmed/22151252 http://dx.doi.org/10.1186/1471-2105-12-S8-S9
work_keys_str_mv	AT kimsun classifyingproteinproteininteractionarticlesusingwordandsyntacticfeatures AT wilburwjohn classifyingproteinproteininteractionarticlesusingwordandsyntacticfeatures

Classifying protein-protein interaction articles using word and syntactic features

Ejemplares similares