Cargando…

Gene mention normalization and interaction extraction with context models and sentence motifs

BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified rel...

Descripción completa

Detalles Bibliográficos
Autores principales: Hakenberg, Jörg, Plake, Conrad, Royer, Loic, Strobelt, Hendrik, Leser, Ulf, Schroeder, Michael
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559985/
https://www.ncbi.nlm.nih.gov/pubmed/18834492
http://dx.doi.org/10.1186/gb-2008-9-s2-s14
_version_ 1782159692035784704
author Hakenberg, Jörg
Plake, Conrad
Royer, Loic
Strobelt, Hendrik
Leser, Ulf
Schroeder, Michael
author_facet Hakenberg, Jörg
Plake, Conrad
Royer, Loic
Strobelt, Hendrik
Leser, Ulf
Schroeder, Michael
author_sort Hakenberg, Jörg
collection PubMed
description BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified relationships between those objects. We describe a method for identifying genes and relationships between proteins. RESULTS: We present solutions to gene mention normalization and extraction of protein-protein interactions. For the first task, we identify genes by using background knowledge on each gene, namely annotations related to function, location, disease, and so on. Our approach currently achieves an f-measure of 86.4% on the BioCreative II gene normalization data. For the extraction of protein-protein interactions, we pursue an approach that builds on classical sequence analysis: motifs derived from multiple sequence alignments. The method achieves an f-measure of 24.4% (micro-average) in the BioCreative II interaction pair subtask. CONCLUSION: For gene mention normalization, our approach outperforms strategies that utilize only the matching of genes names against dictionaries, without invoking further knowledge on each gene. Motifs derived from alignments of sentences are successful at identifying protein interactions in text; the approach we present in this report is fully automated and performs similarly to systems that require human intervention at one or more stages. AVAILABILITY: Our methods for gene, protein, and species identification, and extraction of protein-protein are available as part of the BioCreative Meta Services (BCMS), see .
format Text
id pubmed-2559985
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25599852008-10-04 Gene mention normalization and interaction extraction with context models and sentence motifs Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael Genome Biol Research BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified relationships between those objects. We describe a method for identifying genes and relationships between proteins. RESULTS: We present solutions to gene mention normalization and extraction of protein-protein interactions. For the first task, we identify genes by using background knowledge on each gene, namely annotations related to function, location, disease, and so on. Our approach currently achieves an f-measure of 86.4% on the BioCreative II gene normalization data. For the extraction of protein-protein interactions, we pursue an approach that builds on classical sequence analysis: motifs derived from multiple sequence alignments. The method achieves an f-measure of 24.4% (micro-average) in the BioCreative II interaction pair subtask. CONCLUSION: For gene mention normalization, our approach outperforms strategies that utilize only the matching of genes names against dictionaries, without invoking further knowledge on each gene. Motifs derived from alignments of sentences are successful at identifying protein interactions in text; the approach we present in this report is fully automated and performs similarly to systems that require human intervention at one or more stages. AVAILABILITY: Our methods for gene, protein, and species identification, and extraction of protein-protein are available as part of the BioCreative Meta Services (BCMS), see . BioMed Central 2008 2008-09-01 /pmc/articles/PMC2559985/ /pubmed/18834492 http://dx.doi.org/10.1186/gb-2008-9-s2-s14 Text en Copyright © 2008 Hakenberg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Hakenberg, Jörg
Plake, Conrad
Royer, Loic
Strobelt, Hendrik
Leser, Ulf
Schroeder, Michael
Gene mention normalization and interaction extraction with context models and sentence motifs
title Gene mention normalization and interaction extraction with context models and sentence motifs
title_full Gene mention normalization and interaction extraction with context models and sentence motifs
title_fullStr Gene mention normalization and interaction extraction with context models and sentence motifs
title_full_unstemmed Gene mention normalization and interaction extraction with context models and sentence motifs
title_short Gene mention normalization and interaction extraction with context models and sentence motifs
title_sort gene mention normalization and interaction extraction with context models and sentence motifs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559985/
https://www.ncbi.nlm.nih.gov/pubmed/18834492
http://dx.doi.org/10.1186/gb-2008-9-s2-s14
work_keys_str_mv AT hakenbergjorg genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs
AT plakeconrad genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs
AT royerloic genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs
AT strobelthendrik genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs
AT leserulf genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs
AT schroedermichael genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs