Cargando…
Gene mention normalization and interaction extraction with context models and sentence motifs
BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified rel...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559985/ https://www.ncbi.nlm.nih.gov/pubmed/18834492 http://dx.doi.org/10.1186/gb-2008-9-s2-s14 |
_version_ | 1782159692035784704 |
---|---|
author | Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael |
author_facet | Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael |
author_sort | Hakenberg, Jörg |
collection | PubMed |
description | BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified relationships between those objects. We describe a method for identifying genes and relationships between proteins. RESULTS: We present solutions to gene mention normalization and extraction of protein-protein interactions. For the first task, we identify genes by using background knowledge on each gene, namely annotations related to function, location, disease, and so on. Our approach currently achieves an f-measure of 86.4% on the BioCreative II gene normalization data. For the extraction of protein-protein interactions, we pursue an approach that builds on classical sequence analysis: motifs derived from multiple sequence alignments. The method achieves an f-measure of 24.4% (micro-average) in the BioCreative II interaction pair subtask. CONCLUSION: For gene mention normalization, our approach outperforms strategies that utilize only the matching of genes names against dictionaries, without invoking further knowledge on each gene. Motifs derived from alignments of sentences are successful at identifying protein interactions in text; the approach we present in this report is fully automated and performs similarly to systems that require human intervention at one or more stages. AVAILABILITY: Our methods for gene, protein, and species identification, and extraction of protein-protein are available as part of the BioCreative Meta Services (BCMS), see . |
format | Text |
id | pubmed-2559985 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25599852008-10-04 Gene mention normalization and interaction extraction with context models and sentence motifs Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael Genome Biol Research BACKGROUND: The goal of text mining is to make the information conveyed in scientific publications accessible to structured search and automatic analysis. Two important subtasks of text mining are entity mention normalization - to identify biomedical objects in text - and extraction of qualified relationships between those objects. We describe a method for identifying genes and relationships between proteins. RESULTS: We present solutions to gene mention normalization and extraction of protein-protein interactions. For the first task, we identify genes by using background knowledge on each gene, namely annotations related to function, location, disease, and so on. Our approach currently achieves an f-measure of 86.4% on the BioCreative II gene normalization data. For the extraction of protein-protein interactions, we pursue an approach that builds on classical sequence analysis: motifs derived from multiple sequence alignments. The method achieves an f-measure of 24.4% (micro-average) in the BioCreative II interaction pair subtask. CONCLUSION: For gene mention normalization, our approach outperforms strategies that utilize only the matching of genes names against dictionaries, without invoking further knowledge on each gene. Motifs derived from alignments of sentences are successful at identifying protein interactions in text; the approach we present in this report is fully automated and performs similarly to systems that require human intervention at one or more stages. AVAILABILITY: Our methods for gene, protein, and species identification, and extraction of protein-protein are available as part of the BioCreative Meta Services (BCMS), see . BioMed Central 2008 2008-09-01 /pmc/articles/PMC2559985/ /pubmed/18834492 http://dx.doi.org/10.1186/gb-2008-9-s2-s14 Text en Copyright © 2008 Hakenberg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Hakenberg, Jörg Plake, Conrad Royer, Loic Strobelt, Hendrik Leser, Ulf Schroeder, Michael Gene mention normalization and interaction extraction with context models and sentence motifs |
title | Gene mention normalization and interaction extraction with context models and sentence motifs |
title_full | Gene mention normalization and interaction extraction with context models and sentence motifs |
title_fullStr | Gene mention normalization and interaction extraction with context models and sentence motifs |
title_full_unstemmed | Gene mention normalization and interaction extraction with context models and sentence motifs |
title_short | Gene mention normalization and interaction extraction with context models and sentence motifs |
title_sort | gene mention normalization and interaction extraction with context models and sentence motifs |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559985/ https://www.ncbi.nlm.nih.gov/pubmed/18834492 http://dx.doi.org/10.1186/gb-2008-9-s2-s14 |
work_keys_str_mv | AT hakenbergjorg genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT plakeconrad genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT royerloic genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT strobelthendrik genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT leserulf genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs AT schroedermichael genementionnormalizationandinteractionextractionwithcontextmodelsandsentencemotifs |