Cargando…
An analysis of gene/protein associations at PubMed scale
BACKGROUND: Event extraction following the GENIA Event corpus and BioNLP shared task models has been a considerable focus of recent work in biomedical information extraction. This work includes efforts applying event extraction methods to the entire PubMed literature database, far beyond the narrow...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3239305/ https://www.ncbi.nlm.nih.gov/pubmed/22166173 http://dx.doi.org/10.1186/2041-1480-2-S5-S5 |
_version_ | 1782219163954053120 |
---|---|
author | Pyysalo, Sampo Ohta, Tomoko Tsujii, Jun’ichi |
author_facet | Pyysalo, Sampo Ohta, Tomoko Tsujii, Jun’ichi |
author_sort | Pyysalo, Sampo |
collection | PubMed |
description | BACKGROUND: Event extraction following the GENIA Event corpus and BioNLP shared task models has been a considerable focus of recent work in biomedical information extraction. This work includes efforts applying event extraction methods to the entire PubMed literature database, far beyond the narrow subdomains of biomedicine for which annotated resources for extraction method development are available. RESULTS: In the present study, our aim is to estimate the coverage of all statements of gene/protein associations in PubMed that existing resources for event extraction can provide. We base our analysis on a recently released corpus automatically annotated for gene/protein entities and syntactic analyses covering the entire PubMed, and use named entity co-occurrence, shortest dependency paths and an unlexicalized classifier to identify likely statements of gene/protein associations. A set of high-frequency/high-likelihood association statements are then manually analyzed with reference to the GENIA ontology. CONCLUSIONS: We present a first estimate of the overall coverage of gene/protein associations provided by existing resources for event extraction. Our results suggest that for event-type associations this coverage may be over 90%. We also identify several biologically significant associations of genes and proteins that are not addressed by these resources, suggesting directions for further extension of extraction coverage. |
format | Online Article Text |
id | pubmed-3239305 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32393052011-12-16 An analysis of gene/protein associations at PubMed scale Pyysalo, Sampo Ohta, Tomoko Tsujii, Jun’ichi J Biomed Semantics Research BACKGROUND: Event extraction following the GENIA Event corpus and BioNLP shared task models has been a considerable focus of recent work in biomedical information extraction. This work includes efforts applying event extraction methods to the entire PubMed literature database, far beyond the narrow subdomains of biomedicine for which annotated resources for extraction method development are available. RESULTS: In the present study, our aim is to estimate the coverage of all statements of gene/protein associations in PubMed that existing resources for event extraction can provide. We base our analysis on a recently released corpus automatically annotated for gene/protein entities and syntactic analyses covering the entire PubMed, and use named entity co-occurrence, shortest dependency paths and an unlexicalized classifier to identify likely statements of gene/protein associations. A set of high-frequency/high-likelihood association statements are then manually analyzed with reference to the GENIA ontology. CONCLUSIONS: We present a first estimate of the overall coverage of gene/protein associations provided by existing resources for event extraction. Our results suggest that for event-type associations this coverage may be over 90%. We also identify several biologically significant associations of genes and proteins that are not addressed by these resources, suggesting directions for further extension of extraction coverage. BioMed Central 2011-10-06 /pmc/articles/PMC3239305/ /pubmed/22166173 http://dx.doi.org/10.1186/2041-1480-2-S5-S5 Text en Copyright ©2011 Pyysalo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Pyysalo, Sampo Ohta, Tomoko Tsujii, Jun’ichi An analysis of gene/protein associations at PubMed scale |
title | An analysis of gene/protein associations at PubMed scale |
title_full | An analysis of gene/protein associations at PubMed scale |
title_fullStr | An analysis of gene/protein associations at PubMed scale |
title_full_unstemmed | An analysis of gene/protein associations at PubMed scale |
title_short | An analysis of gene/protein associations at PubMed scale |
title_sort | analysis of gene/protein associations at pubmed scale |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3239305/ https://www.ncbi.nlm.nih.gov/pubmed/22166173 http://dx.doi.org/10.1186/2041-1480-2-S5-S5 |
work_keys_str_mv | AT pyysalosampo ananalysisofgeneproteinassociationsatpubmedscale AT ohtatomoko ananalysisofgeneproteinassociationsatpubmedscale AT tsujiijunichi ananalysisofgeneproteinassociationsatpubmedscale AT pyysalosampo analysisofgeneproteinassociationsatpubmedscale AT ohtatomoko analysisofgeneproteinassociationsatpubmedscale AT tsujiijunichi analysisofgeneproteinassociationsatpubmedscale |