Cargando…
Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS
We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
American Medical Informatics Association
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041551/ https://www.ncbi.nlm.nih.gov/pubmed/21347142 |
_version_ | 1782198445120946176 |
---|---|
author | Luo, Zhihui Duffy, Robert Johnson, Stephen Weng, Chunhua |
author_facet | Luo, Zhihui Duffy, Robert Johnson, Stephen Weng, Chunhua |
author_sort | Luo, Zhihui |
collection | PubMed |
description | We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus. |
format | Text |
id | pubmed-3041551 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | American Medical Informatics Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-30415512011-02-23 Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS Luo, Zhihui Duffy, Robert Johnson, Stephen Weng, Chunhua Summit on Translat Bioinforma Articles We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus. American Medical Informatics Association 2010-03-01 /pmc/articles/PMC3041551/ /pubmed/21347142 Text en ©2010 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose |
spellingShingle | Articles Luo, Zhihui Duffy, Robert Johnson, Stephen Weng, Chunhua Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS |
title | Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS |
title_full | Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS |
title_fullStr | Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS |
title_full_unstemmed | Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS |
title_short | Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS |
title_sort | corpus-based approach to creating a semantic lexicon for clinical research eligibility criteria from umls |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041551/ https://www.ncbi.nlm.nih.gov/pubmed/21347142 |
work_keys_str_mv | AT luozhihui corpusbasedapproachtocreatingasemanticlexiconforclinicalresearcheligibilitycriteriafromumls AT duffyrobert corpusbasedapproachtocreatingasemanticlexiconforclinicalresearcheligibilitycriteriafromumls AT johnsonstephen corpusbasedapproachtocreatingasemanticlexiconforclinicalresearcheligibilitycriteriafromumls AT wengchunhua corpusbasedapproachtocreatingasemanticlexiconforclinicalresearcheligibilitycriteriafromumls |