Cargando…

A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences

This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information r...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Ming Che, Chang, Jia Wei, Hsieh, Tung Cheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005080/
https://www.ncbi.nlm.nih.gov/pubmed/24982952
http://dx.doi.org/10.1155/2014/437162
_version_ 1782314060091490304
author Lee, Ming Che
Chang, Jia Wei
Hsieh, Tung Cheng
author_facet Lee, Ming Che
Chang, Jia Wei
Hsieh, Tung Cheng
author_sort Lee, Ming Che
collection PubMed
description This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure.
format Online
Article
Text
id pubmed-4005080
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-40050802014-06-30 A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences Lee, Ming Che Chang, Jia Wei Hsieh, Tung Cheng ScientificWorldJournal Research Article This paper presents a grammar and semantic corpus based similarity algorithm for natural language sentences. Natural language, in opposition to “artificial language”, such as computer programming languages, is the language used by the general public for daily communication. Traditional information retrieval approaches, such as vector models, LSA, HAL, or even the ontology-based approaches that extend to include concept similarity comparison instead of cooccurrence terms/words, may not always determine the perfect matching while there is no obvious relation or concept overlap between two natural language sentences. This paper proposes a sentence similarity algorithm that takes advantage of corpus-based ontology and grammatical rules to overcome the addressed problems. Experiments on two famous benchmarks demonstrate that the proposed algorithm has a significant performance improvement in sentences/short-texts with arbitrary syntax and structure. Hindawi Publishing Corporation 2014 2014-04-10 /pmc/articles/PMC4005080/ /pubmed/24982952 http://dx.doi.org/10.1155/2014/437162 Text en Copyright © 2014 Ming Che Lee et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lee, Ming Che
Chang, Jia Wei
Hsieh, Tung Cheng
A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences
title A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences
title_full A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences
title_fullStr A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences
title_full_unstemmed A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences
title_short A Grammar-Based Semantic Similarity Algorithm for Natural Language Sentences
title_sort grammar-based semantic similarity algorithm for natural language sentences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4005080/
https://www.ncbi.nlm.nih.gov/pubmed/24982952
http://dx.doi.org/10.1155/2014/437162
work_keys_str_mv AT leemingche agrammarbasedsemanticsimilarityalgorithmfornaturallanguagesentences
AT changjiawei agrammarbasedsemanticsimilarityalgorithmfornaturallanguagesentences
AT hsiehtungcheng agrammarbasedsemanticsimilarityalgorithmfornaturallanguagesentences
AT leemingche grammarbasedsemanticsimilarityalgorithmfornaturallanguagesentences
AT changjiawei grammarbasedsemanticsimilarityalgorithmfornaturallanguagesentences
AT hsiehtungcheng grammarbasedsemanticsimilarityalgorithmfornaturallanguagesentences