Cargando…

Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS

We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Zhihui, Duffy, Robert, Johnson, Stephen, Weng, Chunhua
Formato: Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041551/
https://www.ncbi.nlm.nih.gov/pubmed/21347142
_version_ 1782198445120946176
author Luo, Zhihui
Duffy, Robert
Johnson, Stephen
Weng, Chunhua
author_facet Luo, Zhihui
Duffy, Robert
Johnson, Stephen
Weng, Chunhua
author_sort Luo, Zhihui
collection PubMed
description We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus.
format Text
id pubmed-3041551
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-30415512011-02-23 Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS Luo, Zhihui Duffy, Robert Johnson, Stephen Weng, Chunhua Summit on Translat Bioinforma Articles We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus. American Medical Informatics Association 2010-03-01 /pmc/articles/PMC3041551/ /pubmed/21347142 Text en ©2010 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Luo, Zhihui
Duffy, Robert
Johnson, Stephen
Weng, Chunhua
Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS
title Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS
title_full Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS
title_fullStr Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS
title_full_unstemmed Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS
title_short Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS
title_sort corpus-based approach to creating a semantic lexicon for clinical research eligibility criteria from umls
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041551/
https://www.ncbi.nlm.nih.gov/pubmed/21347142
work_keys_str_mv AT luozhihui corpusbasedapproachtocreatingasemanticlexiconforclinicalresearcheligibilitycriteriafromumls
AT duffyrobert corpusbasedapproachtocreatingasemanticlexiconforclinicalresearcheligibilitycriteriafromumls
AT johnsonstephen corpusbasedapproachtocreatingasemanticlexiconforclinicalresearcheligibilitycriteriafromumls
AT wengchunhua corpusbasedapproachtocreatingasemanticlexiconforclinicalresearcheligibilitycriteriafromumls