Cargando…

Moara: a Java library for extracting and normalizing gene and protein mentions

BACKGROUND: Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solution...

Descripción completa

Detalles Bibliográficos
Autores principales:	Neves, Mariana L, Carazo, José-María, Pascual-Montano, Alberto
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851609/ https://www.ncbi.nlm.nih.gov/pubmed/20346105 http://dx.doi.org/10.1186/1471-2105-11-157

_version_	1782179883009441792
author	Neves, Mariana L Carazo, José-María Pascual-Montano, Alberto
author_facet	Neves, Mariana L Carazo, José-María Pascual-Montano, Alberto
author_sort	Neves, Mariana L
collection	PubMed
description	BACKGROUND: Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tasks are not readily available. RESULTS: This study proposes a versatile and trainable Java library that implements gene/protein tagger and normalization steps based on machine learning approaches. The system has been trained for several model organisms and corpora but can be expanded to support new organisms and documents. CONCLUSIONS: Moara is a flexible, trainable and open-source system that is not specifically orientated to any organism and therefore does not requires specific tuning in the algorithms or dictionaries utilized. Moara can be used as a stand-alone application or can be incorporated in the workflow of a more general text mining system.
format	Text
id	pubmed-2851609
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-28516092010-04-09 Moara: a Java library for extracting and normalizing gene and protein mentions Neves, Mariana L Carazo, José-María Pascual-Montano, Alberto BMC Bioinformatics Software BACKGROUND: Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tasks are not readily available. RESULTS: This study proposes a versatile and trainable Java library that implements gene/protein tagger and normalization steps based on machine learning approaches. The system has been trained for several model organisms and corpora but can be expanded to support new organisms and documents. CONCLUSIONS: Moara is a flexible, trainable and open-source system that is not specifically orientated to any organism and therefore does not requires specific tuning in the algorithms or dictionaries utilized. Moara can be used as a stand-alone application or can be incorporated in the workflow of a more general text mining system. BioMed Central 2010-03-26 /pmc/articles/PMC2851609/ /pubmed/20346105 http://dx.doi.org/10.1186/1471-2105-11-157 Text en Copyright ©2010 Neves et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Neves, Mariana L Carazo, José-María Pascual-Montano, Alberto Moara: a Java library for extracting and normalizing gene and protein mentions
title	Moara: a Java library for extracting and normalizing gene and protein mentions
title_full	Moara: a Java library for extracting and normalizing gene and protein mentions
title_fullStr	Moara: a Java library for extracting and normalizing gene and protein mentions
title_full_unstemmed	Moara: a Java library for extracting and normalizing gene and protein mentions
title_short	Moara: a Java library for extracting and normalizing gene and protein mentions
title_sort	moara: a java library for extracting and normalizing gene and protein mentions
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2851609/ https://www.ncbi.nlm.nih.gov/pubmed/20346105 http://dx.doi.org/10.1186/1471-2105-11-157
work_keys_str_mv	AT nevesmarianal moaraajavalibraryforextractingandnormalizinggeneandproteinmentions AT carazojosemaria moaraajavalibraryforextractingandnormalizinggeneandproteinmentions AT pascualmontanoalberto moaraajavalibraryforextractingandnormalizinggeneandproteinmentions

Moara: a Java library for extracting and normalizing gene and protein mentions

Ejemplares similares