Cargando…

Automated methods of textual content analysis and description of text structures

Universal Semantic Language (USL) is a semi-formalized approach for the description of knowledge (a knowledge representation tool). The idea of USL was introduced by Vladimir Smetacek in the system called SEMAN which was used for keyword extraction tasks in the former Information centre of the Czech...

Descripción completa

Detalles Bibliográficos
Autor principal:	Chýla, Roman
Lenguaje:	eng
Publicado:	2012
Materias:	Computing and Computers
Acceso en línea:	http://cds.cern.ch/record/1450189

_version_	1780924911005466624
author	Chýla, Roman
author_facet	Chýla, Roman
author_sort	Chýla, Roman
collection	CERN
description	Universal Semantic Language (USL) is a semi-formalized approach for the description of knowledge (a knowledge representation tool). The idea of USL was introduced by Vladimir Smetacek in the system called SEMAN which was used for keyword extraction tasks in the former Information centre of the Czechoslovak Republic. However due to the dissolution of the centre in early 90's, the system has been lost. This thesis reintroduces the idea of USL in a new context of quantitative content analysis. First we introduce the historical background and the problems of semantics and knowledge representation, semes, semantic fields, semantic primes and universals. The basic methodology of content analysis studies is illustrated on the example of three content analysis tools and we describe the architecture of a new system. The application was built specifically for USL discovery but it can work also in the context of classical content analysis. It contains Natural Language Processing (NLP) components and employs the algorithm for collocation discovery adapted for the case of cooccurences search between semantic annotations. The software is evaluated by comparing its pattern matching mechanism against another existing and established extractor. The semantic translation mechanism is evaluated in the task of automated document classification with special attention to the problem of semantic ambiguity and correct translation. Finally we evaluate the ability of the system to discover statistically significant semantic relationships from textual corpora.
id	cern-1450189
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2012
record_format	invenio
spelling	cern-14501892019-09-30T06:29:59Zhttp://cds.cern.ch/record/1450189engChýla, RomanAutomated methods of textual content analysis and description of text structuresComputing and ComputersUniversal Semantic Language (USL) is a semi-formalized approach for the description of knowledge (a knowledge representation tool). The idea of USL was introduced by Vladimir Smetacek in the system called SEMAN which was used for keyword extraction tasks in the former Information centre of the Czechoslovak Republic. However due to the dissolution of the centre in early 90's, the system has been lost. This thesis reintroduces the idea of USL in a new context of quantitative content analysis. First we introduce the historical background and the problems of semantics and knowledge representation, semes, semantic fields, semantic primes and universals. The basic methodology of content analysis studies is illustrated on the example of three content analysis tools and we describe the architecture of a new system. The application was built specifically for USL discovery but it can work also in the context of classical content analysis. It contains Natural Language Processing (NLP) components and employs the algorithm for collocation discovery adapted for the case of cooccurences search between semantic annotations. The software is evaluated by comparing its pattern matching mechanism against another existing and established extractor. The semantic translation mechanism is evaluated in the task of automated document classification with special attention to the problem of semantic ambiguity and correct translation. Finally we evaluate the ability of the system to discover statistically significant semantic relationships from textual corpora.CERN-THESIS-2011-239oai:cds.cern.ch:14501892012-05-22T08:31:21Z
spellingShingle	Computing and Computers Chýla, Roman Automated methods of textual content analysis and description of text structures
title	Automated methods of textual content analysis and description of text structures
title_full	Automated methods of textual content analysis and description of text structures
title_fullStr	Automated methods of textual content analysis and description of text structures
title_full_unstemmed	Automated methods of textual content analysis and description of text structures
title_short	Automated methods of textual content analysis and description of text structures
title_sort	automated methods of textual content analysis and description of text structures
topic	Computing and Computers
url	http://cds.cern.ch/record/1450189
work_keys_str_mv	AT chylaroman automatedmethodsoftextualcontentanalysisanddescriptionoftextstructures

Automated methods of textual content analysis and description of text structures

Ejemplares similares