Cargando…

Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™

MOTIVATION: With the increasing volume of scientific papers and heterogeneous nomenclature in the biomedical literature, it is apparent that an improvement over standard pattern matching available in existing search engines is required. Cognition Search Information Retrieval (CSIR) is a natural lang...

Descripción completa

Detalles Bibliográficos
Autores principales: Goldsmith, Elizabeth J., Mendiratta, Saurabh, Akella, Radha, Dahlgren, Kathleen
Formato: Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041583/
https://www.ncbi.nlm.nih.gov/pubmed/21347167
_version_ 1782198452722073600
author Goldsmith, Elizabeth J.
Mendiratta, Saurabh
Akella, Radha
Dahlgren, Kathleen
author_facet Goldsmith, Elizabeth J.
Mendiratta, Saurabh
Akella, Radha
Dahlgren, Kathleen
author_sort Goldsmith, Elizabeth J.
collection PubMed
description MOTIVATION: With the increasing volume of scientific papers and heterogeneous nomenclature in the biomedical literature, it is apparent that an improvement over standard pattern matching available in existing search engines is required. Cognition Search Information Retrieval (CSIR) is a natural language processing (NLP) technology that possesses a large dictionary (lexicon) and large semantic databases, such that search can be based on meaning. Encoded synonymy, ontological relationships, phrases, and seeds for word sense disambiguation offer significant improvement over pattern matching. Thus, the CSIR has the right architecture to form the basis for a scientific search engine. RESULT: Here we have augmented CSIR to improve access to the MEDLINE database of scientific abstracts. New biochemical, molecular biological and medical language and acronyms were introduced from curated web-based sources. The resulting system was used to interpret MEDLINE abstracts. Meaning-based search of MEDLINE abstracts yields high precision (estimated at >90%), and high recall (estimated at >90%), where synonym, ontology, phrases and sense seeds have been encoded. The present implementation can be found at http://MEDLINE.cognition.com. CONTACT: Elizabeth.goldsmith@UTsouthwestern.edu Kathleen.dahlgren@cognition.com
format Text
id pubmed-3041583
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-30415832011-02-23 Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™ Goldsmith, Elizabeth J. Mendiratta, Saurabh Akella, Radha Dahlgren, Kathleen Summit on Translat Bioinforma Articles MOTIVATION: With the increasing volume of scientific papers and heterogeneous nomenclature in the biomedical literature, it is apparent that an improvement over standard pattern matching available in existing search engines is required. Cognition Search Information Retrieval (CSIR) is a natural language processing (NLP) technology that possesses a large dictionary (lexicon) and large semantic databases, such that search can be based on meaning. Encoded synonymy, ontological relationships, phrases, and seeds for word sense disambiguation offer significant improvement over pattern matching. Thus, the CSIR has the right architecture to form the basis for a scientific search engine. RESULT: Here we have augmented CSIR to improve access to the MEDLINE database of scientific abstracts. New biochemical, molecular biological and medical language and acronyms were introduced from curated web-based sources. The resulting system was used to interpret MEDLINE abstracts. Meaning-based search of MEDLINE abstracts yields high precision (estimated at >90%), and high recall (estimated at >90%), where synonym, ontology, phrases and sense seeds have been encoded. The present implementation can be found at http://MEDLINE.cognition.com. CONTACT: Elizabeth.goldsmith@UTsouthwestern.edu Kathleen.dahlgren@cognition.com American Medical Informatics Association 2009-03-01 /pmc/articles/PMC3041583/ /pubmed/21347167 Text en ©2009 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Goldsmith, Elizabeth J.
Mendiratta, Saurabh
Akella, Radha
Dahlgren, Kathleen
Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™
title Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™
title_full Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™
title_fullStr Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™
title_full_unstemmed Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™
title_short Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™
title_sort natural language query in the biochemistry and molecular biology domains based on cognition search™
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041583/
https://www.ncbi.nlm.nih.gov/pubmed/21347167
work_keys_str_mv AT goldsmithelizabethj naturallanguagequeryinthebiochemistryandmolecularbiologydomainsbasedoncognitionsearch
AT mendirattasaurabh naturallanguagequeryinthebiochemistryandmolecularbiologydomainsbasedoncognitionsearch
AT akellaradha naturallanguagequeryinthebiochemistryandmolecularbiologydomainsbasedoncognitionsearch
AT dahlgrenkathleen naturallanguagequeryinthebiochemistryandmolecularbiologydomainsbasedoncognitionsearch