Cargando…

Using contextual and lexical features to restructure and validate the classification of biomedical concepts

BACKGROUND: Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fan, Jung-Wei, Xu, Hua, Friedman, Carol
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2014782/ https://www.ncbi.nlm.nih.gov/pubmed/17650333 http://dx.doi.org/10.1186/1471-2105-8-264

_version_	1782136567045816320
author	Fan, Jung-Wei Xu, Hua Friedman, Carol
author_facet	Fan, Jung-Wei Xu, Hua Friedman, Carol
author_sort	Fan, Jung-Wei
collection	PubMed
description	BACKGROUND: Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach. RESULTS: The string-based approach achieved an error rate of 0.143, with a mean reciprocal rank of 0.907. The context-based and string-based approaches were found to be complementary, and the error rate was reduced further by applying a linear combination of the two classifiers. The advantage of combining the two approaches was especially manifested on test data with sufficient contextual features, achieving the lowest error rate of 0.055 and a mean reciprocal rank of 0.969. CONCLUSION: The lexical features provide another semantic dimension in addition to syntactic contextual features that support the classification of ontological concepts. The classification errors of each dimension can be further reduced through appropriate combination of the complementary classifiers.
format	Text
id	pubmed-2014782
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-20147822007-10-11 Using contextual and lexical features to restructure and validate the classification of biomedical concepts Fan, Jung-Wei Xu, Hua Friedman, Carol BMC Bioinformatics Research Article BACKGROUND: Biomedical ontologies are critical for integration of data from diverse sources and for use by knowledge-based biomedical applications, especially natural language processing as well as associated mining and reasoning systems. The effectiveness of these systems is heavily dependent on the quality of the ontological terms and their classifications. To assist in developing and maintaining the ontologies objectively, we propose automatic approaches to classify and/or validate their semantic categories. In previous work, we developed an approach using contextual syntactic features obtained from a large domain corpus to reclassify and validate concepts of the Unified Medical Language System (UMLS), a comprehensive resource of biomedical terminology. In this paper, we introduce another classification approach based on words of the concept strings and compare it to the contextual syntactic approach. RESULTS: The string-based approach achieved an error rate of 0.143, with a mean reciprocal rank of 0.907. The context-based and string-based approaches were found to be complementary, and the error rate was reduced further by applying a linear combination of the two classifiers. The advantage of combining the two approaches was especially manifested on test data with sufficient contextual features, achieving the lowest error rate of 0.055 and a mean reciprocal rank of 0.969. CONCLUSION: The lexical features provide another semantic dimension in addition to syntactic contextual features that support the classification of ontological concepts. The classification errors of each dimension can be further reduced through appropriate combination of the complementary classifiers. BioMed Central 2007-07-24 /pmc/articles/PMC2014782/ /pubmed/17650333 http://dx.doi.org/10.1186/1471-2105-8-264 Text en Copyright © 2007 Fan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Fan, Jung-Wei Xu, Hua Friedman, Carol Using contextual and lexical features to restructure and validate the classification of biomedical concepts
title	Using contextual and lexical features to restructure and validate the classification of biomedical concepts
title_full	Using contextual and lexical features to restructure and validate the classification of biomedical concepts
title_fullStr	Using contextual and lexical features to restructure and validate the classification of biomedical concepts
title_full_unstemmed	Using contextual and lexical features to restructure and validate the classification of biomedical concepts
title_short	Using contextual and lexical features to restructure and validate the classification of biomedical concepts
title_sort	using contextual and lexical features to restructure and validate the classification of biomedical concepts
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2014782/ https://www.ncbi.nlm.nih.gov/pubmed/17650333 http://dx.doi.org/10.1186/1471-2105-8-264
work_keys_str_mv	AT fanjungwei usingcontextualandlexicalfeaturestorestructureandvalidatetheclassificationofbiomedicalconcepts AT xuhua usingcontextualandlexicalfeaturestorestructureandvalidatetheclassificationofbiomedicalconcepts AT friedmancarol usingcontextualandlexicalfeaturestorestructureandvalidatetheclassificationofbiomedicalconcepts

Using contextual and lexical features to restructure and validate the classification of biomedical concepts

Ejemplares similares