Cargando…

Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens

BACKGROUND: The Enteropathogen Resource Integration Center (ERIC; ) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as Escherichia coli and Salmonella spp. Rapid and accurate identification of experimental conclusions from the sc...

Descripción completa

Detalles Bibliográficos
Autores principales: Zaremba, Sam, Ramos-Santacruz, Mila, Hampton, Thomas, Shetty, Panna, Fedorko, Joel, Whitmore, Jon, Greene, John M, Perna, Nicole T, Glasner, Jeremy D, Plunkett, Guy, Shaker, Matthew, Pot, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2704210/
https://www.ncbi.nlm.nih.gov/pubmed/19515247
http://dx.doi.org/10.1186/1471-2105-10-177
_version_ 1782168916136558592
author Zaremba, Sam
Ramos-Santacruz, Mila
Hampton, Thomas
Shetty, Panna
Fedorko, Joel
Whitmore, Jon
Greene, John M
Perna, Nicole T
Glasner, Jeremy D
Plunkett, Guy
Shaker, Matthew
Pot, David
author_facet Zaremba, Sam
Ramos-Santacruz, Mila
Hampton, Thomas
Shetty, Panna
Fedorko, Joel
Whitmore, Jon
Greene, John M
Perna, Nicole T
Glasner, Jeremy D
Plunkett, Guy
Shaker, Matthew
Pot, David
author_sort Zaremba, Sam
collection PubMed
description BACKGROUND: The Enteropathogen Resource Integration Center (ERIC; ) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as Escherichia coli and Salmonella spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process. DESCRIPTION: We have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application. CONCLUSION: Our Text Mining application is available online on the ERIC website . The information retrieval interface displays a list of recently published enteropathogen literature abstracts, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed abstracts and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The abstract also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems.
format Text
id pubmed-2704210
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27042102009-07-01 Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens Zaremba, Sam Ramos-Santacruz, Mila Hampton, Thomas Shetty, Panna Fedorko, Joel Whitmore, Jon Greene, John M Perna, Nicole T Glasner, Jeremy D Plunkett, Guy Shaker, Matthew Pot, David BMC Bioinformatics Database BACKGROUND: The Enteropathogen Resource Integration Center (ERIC; ) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as Escherichia coli and Salmonella spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process. DESCRIPTION: We have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application. CONCLUSION: Our Text Mining application is available online on the ERIC website . The information retrieval interface displays a list of recently published enteropathogen literature abstracts, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed abstracts and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The abstract also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems. BioMed Central 2009-06-10 /pmc/articles/PMC2704210/ /pubmed/19515247 http://dx.doi.org/10.1186/1471-2105-10-177 Text en Copyright © 2009 Zaremba et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Database
Zaremba, Sam
Ramos-Santacruz, Mila
Hampton, Thomas
Shetty, Panna
Fedorko, Joel
Whitmore, Jon
Greene, John M
Perna, Nicole T
Glasner, Jeremy D
Plunkett, Guy
Shaker, Matthew
Pot, David
Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens
title Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens
title_full Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens
title_fullStr Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens
title_full_unstemmed Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens
title_short Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens
title_sort text-mining of pubmed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2704210/
https://www.ncbi.nlm.nih.gov/pubmed/19515247
http://dx.doi.org/10.1186/1471-2105-10-177
work_keys_str_mv AT zarembasam textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT ramossantacruzmila textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT hamptonthomas textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT shettypanna textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT fedorkojoel textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT whitmorejon textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT greenejohnm textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT pernanicolet textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT glasnerjeremyd textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT plunkettguy textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT shakermatthew textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens
AT potdavid textminingofpubmedabstractsbynaturallanguageprocessingtocreateapublicknowledgebaseonmolecularmechanismsofbacterialenteropathogens