Cargando…

Critical Assessment of Information Extraction Systems in Biology

An increasing number of groups are now working in the area of text mining, focusing on a wide range of problems and applying both statistical and linguistic approaches. However, it is not possible to compare the different approaches, because there are no common standards or evaluation criteria; in a...

Descripción completa

Detalles Bibliográficos
Autores principales: Blaschke, Christian, Hirschman, Lynette, Yeh, Alexander, Valencia, Alfonso
Formato: Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447314/
https://www.ncbi.nlm.nih.gov/pubmed/18629031
http://dx.doi.org/10.1002/cfg.337
_version_ 1782156908639027200
author Blaschke, Christian
Hirschman, Lynette
Yeh, Alexander
Valencia, Alfonso
author_facet Blaschke, Christian
Hirschman, Lynette
Yeh, Alexander
Valencia, Alfonso
author_sort Blaschke, Christian
collection PubMed
description An increasing number of groups are now working in the area of text mining, focusing on a wide range of problems and applying both statistical and linguistic approaches. However, it is not possible to compare the different approaches, because there are no common standards or evaluation criteria; in addition, the various groups are addressing different problems, often using private datasets. As a result, it is impossible to determine how well the existing systems perform, and particularly what performance level can be expected in real applications. This is similar to the situation in text processing in the late 1980s, prior to the Message Understanding Conferences (MUCs). With the introduction of a common evaluation and standardized evaluation metrics as part of these conferences, it became possible to compare approaches, to identify those techniques that did or did not work and to make progress. This progress has resulted in a common pipeline of processes and a set of shared tools available to the general research community. The field of biology is ripe for a similar experiment. Inspired by this example, the BioLINK group (Biological Literature, Information and Knowledge [1]) is organizing a CASP-like evaluation for the text data-mining community applied to biology. The two main tasks specifically address two major bottlenecks for text mining in biology: (1) the correct detection of gene and protein names in text; and (2) the extraction of functional information related to proteins based on the GO classification system. For further information and participation details, see http://www.pdg.cnb.uam.es/BioLink/BioCreative.eval.html
format Text
id pubmed-2447314
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-24473142008-07-14 Critical Assessment of Information Extraction Systems in Biology Blaschke, Christian Hirschman, Lynette Yeh, Alexander Valencia, Alfonso Comp Funct Genomics Research Article An increasing number of groups are now working in the area of text mining, focusing on a wide range of problems and applying both statistical and linguistic approaches. However, it is not possible to compare the different approaches, because there are no common standards or evaluation criteria; in addition, the various groups are addressing different problems, often using private datasets. As a result, it is impossible to determine how well the existing systems perform, and particularly what performance level can be expected in real applications. This is similar to the situation in text processing in the late 1980s, prior to the Message Understanding Conferences (MUCs). With the introduction of a common evaluation and standardized evaluation metrics as part of these conferences, it became possible to compare approaches, to identify those techniques that did or did not work and to make progress. This progress has resulted in a common pipeline of processes and a set of shared tools available to the general research community. The field of biology is ripe for a similar experiment. Inspired by this example, the BioLINK group (Biological Literature, Information and Knowledge [1]) is organizing a CASP-like evaluation for the text data-mining community applied to biology. The two main tasks specifically address two major bottlenecks for text mining in biology: (1) the correct detection of gene and protein names in text; and (2) the extraction of functional information related to proteins based on the GO classification system. For further information and participation details, see http://www.pdg.cnb.uam.es/BioLink/BioCreative.eval.html Hindawi Publishing Corporation 2003-12 /pmc/articles/PMC2447314/ /pubmed/18629031 http://dx.doi.org/10.1002/cfg.337 Text en Copyright © 2003 Hindawi Publishing Corporation. http://creativecommons.org/licenses/by/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Blaschke, Christian
Hirschman, Lynette
Yeh, Alexander
Valencia, Alfonso
Critical Assessment of Information Extraction Systems in Biology
title Critical Assessment of Information Extraction Systems in Biology
title_full Critical Assessment of Information Extraction Systems in Biology
title_fullStr Critical Assessment of Information Extraction Systems in Biology
title_full_unstemmed Critical Assessment of Information Extraction Systems in Biology
title_short Critical Assessment of Information Extraction Systems in Biology
title_sort critical assessment of information extraction systems in biology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447314/
https://www.ncbi.nlm.nih.gov/pubmed/18629031
http://dx.doi.org/10.1002/cfg.337
work_keys_str_mv AT blaschkechristian criticalassessmentofinformationextractionsystemsinbiology
AT hirschmanlynette criticalassessmentofinformationextractionsystemsinbiology
AT yehalexander criticalassessmentofinformationextractionsystemsinbiology
AT valenciaalfonso criticalassessmentofinformationextractionsystemsinbiology