Cargando…

Benchmarking natural-language parsers for biological applications using dependency graphs

BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy usi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Clegg, Andrew B, Shepherd, Adrian J
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1797812/ https://www.ncbi.nlm.nih.gov/pubmed/17254351 http://dx.doi.org/10.1186/1471-2105-8-24

_version_	1782132316311650304
author	Clegg, Andrew B Shepherd, Adrian J
author_facet	Clegg, Andrew B Shepherd, Adrian J
author_sort	Clegg, Andrew B
collection	PubMed
description	BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. RESULTS: Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. CONCLUSION: Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques.
format	Text
id	pubmed-1797812
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-17978122007-02-16 Benchmarking natural-language parsers for biological applications using dependency graphs Clegg, Andrew B Shepherd, Adrian J BMC Bioinformatics Research Article BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. RESULTS: Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. CONCLUSION: Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques. BioMed Central 2007-01-25 /pmc/articles/PMC1797812/ /pubmed/17254351 http://dx.doi.org/10.1186/1471-2105-8-24 Text en Copyright © 2007 Clegg and Shepherd; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Clegg, Andrew B Shepherd, Adrian J Benchmarking natural-language parsers for biological applications using dependency graphs
title	Benchmarking natural-language parsers for biological applications using dependency graphs
title_full	Benchmarking natural-language parsers for biological applications using dependency graphs
title_fullStr	Benchmarking natural-language parsers for biological applications using dependency graphs
title_full_unstemmed	Benchmarking natural-language parsers for biological applications using dependency graphs
title_short	Benchmarking natural-language parsers for biological applications using dependency graphs
title_sort	benchmarking natural-language parsers for biological applications using dependency graphs
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1797812/ https://www.ncbi.nlm.nih.gov/pubmed/17254351 http://dx.doi.org/10.1186/1471-2105-8-24
work_keys_str_mv	AT cleggandrewb benchmarkingnaturallanguageparsersforbiologicalapplicationsusingdependencygraphs AT shepherdadrianj benchmarkingnaturallanguageparsersforbiologicalapplicationsusingdependencygraphs

Benchmarking natural-language parsers for biological applications using dependency graphs

Ejemplares similares