Cargando…

Benchmarking infrastructure for mutation text mining

BACKGROUND: Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. RESULTS: We propose a community-oriented annotation and ben...

Descripción completa

Detalles Bibliográficos
Autores principales:	Klein, Artjom, Riazanov, Alexandre, Hindle, Matthew M, Baker, Christopher JO
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3939821/ https://www.ncbi.nlm.nih.gov/pubmed/24568600 http://dx.doi.org/10.1186/2041-1480-5-11

_version_	1782305745145954304
author	Klein, Artjom Riazanov, Alexandre Hindle, Matthew M Baker, Christopher JO
author_facet	Klein, Artjom Riazanov, Alexandre Hindle, Matthew M Baker, Christopher JO
author_sort	Klein, Artjom
collection	PubMed
description	BACKGROUND: Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. RESULTS: We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. CONCLUSION: We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
format	Online Article Text
id	pubmed-3939821
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39398212014-03-04 Benchmarking infrastructure for mutation text mining Klein, Artjom Riazanov, Alexandre Hindle, Matthew M Baker, Christopher JO J Biomed Semantics Research BACKGROUND: Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. RESULTS: We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. CONCLUSION: We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. BioMed Central 2014-02-25 /pmc/articles/PMC3939821/ /pubmed/24568600 http://dx.doi.org/10.1186/2041-1480-5-11 Text en Copyright © 2014 Klein et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Klein, Artjom Riazanov, Alexandre Hindle, Matthew M Baker, Christopher JO Benchmarking infrastructure for mutation text mining
title	Benchmarking infrastructure for mutation text mining
title_full	Benchmarking infrastructure for mutation text mining
title_fullStr	Benchmarking infrastructure for mutation text mining
title_full_unstemmed	Benchmarking infrastructure for mutation text mining
title_short	Benchmarking infrastructure for mutation text mining
title_sort	benchmarking infrastructure for mutation text mining
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3939821/ https://www.ncbi.nlm.nih.gov/pubmed/24568600 http://dx.doi.org/10.1186/2041-1480-5-11
work_keys_str_mv	AT kleinartjom benchmarkinginfrastructureformutationtextmining AT riazanovalexandre benchmarkinginfrastructureformutationtextmining AT hindlematthewm benchmarkinginfrastructureformutationtextmining AT bakerchristopherjo benchmarkinginfrastructureformutationtextmining

Benchmarking infrastructure for mutation text mining

Ejemplares similares