Cargando…

Evaluation of Co-occurring Terms in Clinical Documents Using Latent Semantic Indexing

OBJECTIVES: Measurement of similarities between documents is typically influenced by the sparseness of the term-document matrix employed. Latent semantic indexing (LSI) may improve the results of this type of analysis. METHODS: In this study, LSI was utilized in an attempt to reduce the term vector...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Choonghyun, Yoo, Sooyoung, Choi, Jinwook
Formato: Texto
Lenguaje:English
Publicado: Korean Society of Medical Informatics 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3092990/
https://www.ncbi.nlm.nih.gov/pubmed/21818454
http://dx.doi.org/10.4258/hir.2011.17.1.24
Descripción
Sumario:OBJECTIVES: Measurement of similarities between documents is typically influenced by the sparseness of the term-document matrix employed. Latent semantic indexing (LSI) may improve the results of this type of analysis. METHODS: In this study, LSI was utilized in an attempt to reduce the term vector space of clinical documents and newspaper editorials. RESULTS: After applying LSI, document similarities were revealed more clearly in clinical documents than editorials. Clinical documents which can be characterized with co-occurring medical terms, various expressions for the same concepts, abbreviations, and typographical errors showed increased improvement with regards to a correlation between co-occurring terms and document similarities. CONCLUSIONS: Our results showed that LSI can be used effectively to measure similarities in clinical documents. In addition, correlation between the co-occurrence of terms and similarities realized in this study is an important positive feature associated with LSI.