Cargando…

Linguistic feature analysis for protein interaction extraction

BACKGROUND: The rapid growth of the amount of publicly available reports on biomedical experimental results has recently caused a boost of text mining approaches for protein interaction extraction. Most approaches rely implicitly or explicitly on linguistic, i.e., lexical and syntactic, data extract...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fayruzov, Timur, De Cock, Martine, Cornelis, Chris, Hoste, Veronique
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2781821/ https://www.ncbi.nlm.nih.gov/pubmed/19909518 http://dx.doi.org/10.1186/1471-2105-10-374

_version_	1782174595437035520
author	Fayruzov, Timur De Cock, Martine Cornelis, Chris Hoste, Veronique
author_facet	Fayruzov, Timur De Cock, Martine Cornelis, Chris Hoste, Veronique
author_sort	Fayruzov, Timur
collection	PubMed
description	BACKGROUND: The rapid growth of the amount of publicly available reports on biomedical experimental results has recently caused a boost of text mining approaches for protein interaction extraction. Most approaches rely implicitly or explicitly on linguistic, i.e., lexical and syntactic, data extracted from text. However, only few attempts have been made to evaluate the contribution of the different feature types. In this work, we contribute to this evaluation by studying the relative importance of deep syntactic features, i.e., grammatical relations, shallow syntactic features (part-of-speech information) and lexical features. For this purpose, we use a recently proposed approach that uses support vector machines with structured kernels. RESULTS: Our results reveal that the contribution of the different feature types varies for the different data sets on which the experiments were conducted. The smaller the training corpus compared to the test data, the more important the role of grammatical relations becomes. Moreover, deep syntactic information based classifiers prove to be more robust on heterogeneous texts where no or only limited common vocabulary is shared. CONCLUSION: Our findings suggest that grammatical relations play an important role in the interaction extraction task. Moreover, the net advantage of adding lexical and shallow syntactic features is small related to the number of added features. This implies that efficient classifiers can be built by using only a small fraction of the features that are typically being used in recent approaches.
format	Text
id	pubmed-2781821
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27818212009-11-25 Linguistic feature analysis for protein interaction extraction Fayruzov, Timur De Cock, Martine Cornelis, Chris Hoste, Veronique BMC Bioinformatics Research article BACKGROUND: The rapid growth of the amount of publicly available reports on biomedical experimental results has recently caused a boost of text mining approaches for protein interaction extraction. Most approaches rely implicitly or explicitly on linguistic, i.e., lexical and syntactic, data extracted from text. However, only few attempts have been made to evaluate the contribution of the different feature types. In this work, we contribute to this evaluation by studying the relative importance of deep syntactic features, i.e., grammatical relations, shallow syntactic features (part-of-speech information) and lexical features. For this purpose, we use a recently proposed approach that uses support vector machines with structured kernels. RESULTS: Our results reveal that the contribution of the different feature types varies for the different data sets on which the experiments were conducted. The smaller the training corpus compared to the test data, the more important the role of grammatical relations becomes. Moreover, deep syntactic information based classifiers prove to be more robust on heterogeneous texts where no or only limited common vocabulary is shared. CONCLUSION: Our findings suggest that grammatical relations play an important role in the interaction extraction task. Moreover, the net advantage of adding lexical and shallow syntactic features is small related to the number of added features. This implies that efficient classifiers can be built by using only a small fraction of the features that are typically being used in recent approaches. BioMed Central 2009-11-12 /pmc/articles/PMC2781821/ /pubmed/19909518 http://dx.doi.org/10.1186/1471-2105-10-374 Text en Copyright ©2009 Fayruzov et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research article Fayruzov, Timur De Cock, Martine Cornelis, Chris Hoste, Veronique Linguistic feature analysis for protein interaction extraction
title	Linguistic feature analysis for protein interaction extraction
title_full	Linguistic feature analysis for protein interaction extraction
title_fullStr	Linguistic feature analysis for protein interaction extraction
title_full_unstemmed	Linguistic feature analysis for protein interaction extraction
title_short	Linguistic feature analysis for protein interaction extraction
title_sort	linguistic feature analysis for protein interaction extraction
topic	Research article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2781821/ https://www.ncbi.nlm.nih.gov/pubmed/19909518 http://dx.doi.org/10.1186/1471-2105-10-374
work_keys_str_mv	AT fayruzovtimur linguisticfeatureanalysisforproteininteractionextraction AT decockmartine linguisticfeatureanalysisforproteininteractionextraction AT cornelischris linguisticfeatureanalysisforproteininteractionextraction AT hosteveronique linguisticfeatureanalysisforproteininteractionextraction

Linguistic feature analysis for protein interaction extraction

Ejemplares similares