Cargando…
Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™
MOTIVATION: With the increasing volume of scientific papers and heterogeneous nomenclature in the biomedical literature, it is apparent that an improvement over standard pattern matching available in existing search engines is required. Cognition Search Information Retrieval (CSIR) is a natural lang...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
American Medical Informatics Association
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041583/ https://www.ncbi.nlm.nih.gov/pubmed/21347167 |
_version_ | 1782198452722073600 |
---|---|
author | Goldsmith, Elizabeth J. Mendiratta, Saurabh Akella, Radha Dahlgren, Kathleen |
author_facet | Goldsmith, Elizabeth J. Mendiratta, Saurabh Akella, Radha Dahlgren, Kathleen |
author_sort | Goldsmith, Elizabeth J. |
collection | PubMed |
description | MOTIVATION: With the increasing volume of scientific papers and heterogeneous nomenclature in the biomedical literature, it is apparent that an improvement over standard pattern matching available in existing search engines is required. Cognition Search Information Retrieval (CSIR) is a natural language processing (NLP) technology that possesses a large dictionary (lexicon) and large semantic databases, such that search can be based on meaning. Encoded synonymy, ontological relationships, phrases, and seeds for word sense disambiguation offer significant improvement over pattern matching. Thus, the CSIR has the right architecture to form the basis for a scientific search engine. RESULT: Here we have augmented CSIR to improve access to the MEDLINE database of scientific abstracts. New biochemical, molecular biological and medical language and acronyms were introduced from curated web-based sources. The resulting system was used to interpret MEDLINE abstracts. Meaning-based search of MEDLINE abstracts yields high precision (estimated at >90%), and high recall (estimated at >90%), where synonym, ontology, phrases and sense seeds have been encoded. The present implementation can be found at http://MEDLINE.cognition.com. CONTACT: Elizabeth.goldsmith@UTsouthwestern.edu Kathleen.dahlgren@cognition.com |
format | Text |
id | pubmed-3041583 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | American Medical Informatics Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-30415832011-02-23 Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™ Goldsmith, Elizabeth J. Mendiratta, Saurabh Akella, Radha Dahlgren, Kathleen Summit on Translat Bioinforma Articles MOTIVATION: With the increasing volume of scientific papers and heterogeneous nomenclature in the biomedical literature, it is apparent that an improvement over standard pattern matching available in existing search engines is required. Cognition Search Information Retrieval (CSIR) is a natural language processing (NLP) technology that possesses a large dictionary (lexicon) and large semantic databases, such that search can be based on meaning. Encoded synonymy, ontological relationships, phrases, and seeds for word sense disambiguation offer significant improvement over pattern matching. Thus, the CSIR has the right architecture to form the basis for a scientific search engine. RESULT: Here we have augmented CSIR to improve access to the MEDLINE database of scientific abstracts. New biochemical, molecular biological and medical language and acronyms were introduced from curated web-based sources. The resulting system was used to interpret MEDLINE abstracts. Meaning-based search of MEDLINE abstracts yields high precision (estimated at >90%), and high recall (estimated at >90%), where synonym, ontology, phrases and sense seeds have been encoded. The present implementation can be found at http://MEDLINE.cognition.com. CONTACT: Elizabeth.goldsmith@UTsouthwestern.edu Kathleen.dahlgren@cognition.com American Medical Informatics Association 2009-03-01 /pmc/articles/PMC3041583/ /pubmed/21347167 Text en ©2009 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose |
spellingShingle | Articles Goldsmith, Elizabeth J. Mendiratta, Saurabh Akella, Radha Dahlgren, Kathleen Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™ |
title | Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™ |
title_full | Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™ |
title_fullStr | Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™ |
title_full_unstemmed | Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™ |
title_short | Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™ |
title_sort | natural language query in the biochemistry and molecular biology domains based on cognition search™ |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041583/ https://www.ncbi.nlm.nih.gov/pubmed/21347167 |
work_keys_str_mv | AT goldsmithelizabethj naturallanguagequeryinthebiochemistryandmolecularbiologydomainsbasedoncognitionsearch AT mendirattasaurabh naturallanguagequeryinthebiochemistryandmolecularbiologydomainsbasedoncognitionsearch AT akellaradha naturallanguagequeryinthebiochemistryandmolecularbiologydomainsbasedoncognitionsearch AT dahlgrenkathleen naturallanguagequeryinthebiochemistryandmolecularbiologydomainsbasedoncognitionsearch |