Cargando…

Gene mention normalization and interaction extraction with context models and sentence motifs

BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified rel...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hakenberg, Jörg, Plake, Conrad, Royer, Loic, Strobelt, Hendrik, Leser, Ulf, Schroeder, Michael
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559985/ https://www.ncbi.nlm.nih.gov/pubmed/18834492 http://dx.doi.org/10.1186/gb-2008-9-s2-s14

_version_	1782159692035784704
author	Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael
author_facet	Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael
author_sort	Hakenberg, Jörg
collection	PubMed
description	BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified relationships between those objects. We describe a method for identifying genes and relationships between proteins. RESULTS: We present solutions to gene mention normalization and extraction of protein-protein interactions. For the first task, we identify genes by using background knowledge on each gene, namely annotations related to function, location, disease, and so on. Our approach currently achieves an f-measure of 86.4% on the BioCreative II gene normalization data. For the extraction of protein-protein interactions, we pursue an approach that builds on classical sequence analysis: motifs derived from multiple sequence alignments. The method achieves an f-measure of 24.4% (micro-average) in the BioCreative II interaction pair subtask. CONCLUSION: For gene mention normalization, our approach outperforms strategies that utilize only the matching of genes names against dictionaries, without invoking further knowledge on each gene. Motifs derived from alignments of sentences are successful at identifying protein interactions in text; the approach we present in this report is fully automated and performs similarly to systems that require human intervention at one or more stages. AVAILABILITY: Our methods for gene, protein, and species identification, and extraction of protein-protein are available as part of the BioCreative Meta Services (BCMS), see .
format	Text
id	pubmed-2559985
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25599852008-10-04 Gene mention normalization and interaction extraction with context models and sentence motifs Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael Genome Biol Research BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified relationships between those objects. We describe a method for identifying genes and relationships between proteins. RESULTS: We present solutions to gene mention normalization and extraction of protein-protein interactions. For the first task, we identify genes by using background knowledge on each gene, namely annotations related to function, location, disease, and so on. Our approach currently achieves an f-measure of 86.4% on the BioCreative II gene normalization data. For the extraction of protein-protein interactions, we pursue an approach that builds on classical sequence analysis: motifs derived from multiple sequence alignments. The method achieves an f-measure of 24.4% (micro-average) in the BioCreative II interaction pair subtask. CONCLUSION: For gene mention normalization, our approach outperforms strategies that utilize only the matching of genes names against dictionaries, without invoking further knowledge on each gene. Motifs derived from alignments of sentences are successful at identifying protein interactions in text; the approach we present in this report is fully automated and performs similarly to systems that require human intervention at one or more stages. AVAILABILITY: Our methods for gene, protein, and species identification, and extraction of protein-protein are available as part of the BioCreative Meta Services (BCMS), see . BioMed Central 2008 2008-09-01 /pmc/articles/PMC2559985/ /pubmed/18834492 http://dx.doi.org/10.1186/gb-2008-9-s2-s14 Text en Copyright © 2008 Hakenberg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael Gene mention normalization and interaction extraction with context models and sentence motifs
title	Gene mention normalization and interaction extraction with context models and sentence motifs
title_full	Gene mention normalization and interaction extraction with context models and sentence motifs
title_fullStr	Gene mention normalization and interaction extraction with context models and sentence motifs
title_full_unstemmed	Gene mention normalization and interaction extraction with context models and sentence motifs
title_short	Gene mention normalization and interaction extraction with context models and sentence motifs
title_sort	gene mention normalization and interaction extraction with context models and sentence motifs
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559985/ https://www.ncbi.nlm.nih.gov/pubmed/18834492 http://dx.doi.org/10.1186/gb-2008-9-s2-s14
work_keys_str_mv	AT hakenbergjorg genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT plakeconrad genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT royerloic genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT strobelthendrik genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT leserulf genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT schroedermichael genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs

Gene mention normalization and interaction extraction with context models and sentence motifs

Ejemplares similares