Cargando…

OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

BACKGROUND: Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several ev...

Descripción completa

Detalles Bibliográficos
Autores principales: Hunter, Lawrence, Lu, Zhiyong, Firby, James, Baumgartner, William A, Johnson, Helen L, Ogren, Philip V, Cohen, K Bretonnel
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275248/
https://www.ncbi.nlm.nih.gov/pubmed/18237434
http://dx.doi.org/10.1186/1471-2105-9-78
_version_ 1782151841402847232
author Hunter, Lawrence
Lu, Zhiyong
Firby, James
Baumgartner, William A
Johnson, Helen L
Ogren, Philip V
Cohen, K Bretonnel
author_facet Hunter, Lawrence
Lu, Zhiyong
Firby, James
Baumgartner, William A
Johnson, Helen L
Ogren, Philip V
Cohen, K Bretonnel
author_sort Hunter, Lawrence
collection PubMed
description BACKGROUND: Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. RESULTS: OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. CONCLUSION: OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at
format Text
id pubmed-2275248
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22752482008-03-26 OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression Hunter, Lawrence Lu, Zhiyong Firby, James Baumgartner, William A Johnson, Helen L Ogren, Philip V Cohen, K Bretonnel BMC Bioinformatics Software BACKGROUND: Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. RESULTS: OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. CONCLUSION: OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at BioMed Central 2008-01-31 /pmc/articles/PMC2275248/ /pubmed/18237434 http://dx.doi.org/10.1186/1471-2105-9-78 Text en Copyright © 2008 Hunter et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Hunter, Lawrence
Lu, Zhiyong
Firby, James
Baumgartner, William A
Johnson, Helen L
Ogren, Philip V
Cohen, K Bretonnel
OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression
title OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression
title_full OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression
title_fullStr OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression
title_full_unstemmed OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression
title_short OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression
title_sort opendmap: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2275248/
https://www.ncbi.nlm.nih.gov/pubmed/18237434
http://dx.doi.org/10.1186/1471-2105-9-78
work_keys_str_mv AT hunterlawrence opendmapanopensourceontologydrivenconceptanalysisenginewithapplicationstocapturingknowledgeregardingproteintransportproteininteractionsandcelltypespecificgeneexpression
AT luzhiyong opendmapanopensourceontologydrivenconceptanalysisenginewithapplicationstocapturingknowledgeregardingproteintransportproteininteractionsandcelltypespecificgeneexpression
AT firbyjames opendmapanopensourceontologydrivenconceptanalysisenginewithapplicationstocapturingknowledgeregardingproteintransportproteininteractionsandcelltypespecificgeneexpression
AT baumgartnerwilliama opendmapanopensourceontologydrivenconceptanalysisenginewithapplicationstocapturingknowledgeregardingproteintransportproteininteractionsandcelltypespecificgeneexpression
AT johnsonhelenl opendmapanopensourceontologydrivenconceptanalysisenginewithapplicationstocapturingknowledgeregardingproteintransportproteininteractionsandcelltypespecificgeneexpression
AT ogrenphilipv opendmapanopensourceontologydrivenconceptanalysisenginewithapplicationstocapturingknowledgeregardingproteintransportproteininteractionsandcelltypespecificgeneexpression
AT cohenkbretonnel opendmapanopensourceontologydrivenconceptanalysisenginewithapplicationstocapturingknowledgeregardingproteintransportproteininteractionsandcelltypespecificgeneexpression