Cargando…

Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis

Any document in Serbian language can be written in two different scripts: Latin or Cyrillic. Although characteristics of these scripts are similar, some of their statistical measures are quite different. The paper proposed a method for the extraction of certain script from document according to the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Brodić, Darko, Milivojević, Zoran N., Maluckov, Čedomir A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3872444/ https://www.ncbi.nlm.nih.gov/pubmed/24385887 http://dx.doi.org/10.1155/2013/896328

_version_	1782296973090488320
author	Brodić, Darko Milivojević, Zoran N. Maluckov, Čedomir A.
author_facet	Brodić, Darko Milivojević, Zoran N. Maluckov, Čedomir A.
author_sort	Brodić, Darko
collection	PubMed
description	Any document in Serbian language can be written in two different scripts: Latin or Cyrillic. Although characteristics of these scripts are similar, some of their statistical measures are quite different. The paper proposed a method for the extraction of certain script from document according to the occurrence and co-occurrence of the script types. First, each letter is modeled with the certain script type according to characteristics concerning its position in baseline area. Then, the frequency analysis of the script types occurrence is performed. Due to diversity of Latin and Cyrillic script, the occurrence of modeled letters shows substantial statistics dissimilarity. Furthermore, the co-occurrence matrix is computed. The analysis of the co-occurrence matrix draws a strong margin as a criteria to distinguish and recognize the certain script. The proposed method is analyzed on the case of a database which includes different types of printed and web documents. The experiments gave encouraging results.
format	Online Article Text
id	pubmed-3872444
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-38724442014-01-02 Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis Brodić, Darko Milivojević, Zoran N. Maluckov, Čedomir A. ScientificWorldJournal Research Article Any document in Serbian language can be written in two different scripts: Latin or Cyrillic. Although characteristics of these scripts are similar, some of their statistical measures are quite different. The paper proposed a method for the extraction of certain script from document according to the occurrence and co-occurrence of the script types. First, each letter is modeled with the certain script type according to characteristics concerning its position in baseline area. Then, the frequency analysis of the script types occurrence is performed. Due to diversity of Latin and Cyrillic script, the occurrence of modeled letters shows substantial statistics dissimilarity. Furthermore, the co-occurrence matrix is computed. The analysis of the co-occurrence matrix draws a strong margin as a criteria to distinguish and recognize the certain script. The proposed method is analyzed on the case of a database which includes different types of printed and web documents. The experiments gave encouraging results. Hindawi Publishing Corporation 2013-12-10 /pmc/articles/PMC3872444/ /pubmed/24385887 http://dx.doi.org/10.1155/2013/896328 Text en Copyright © 2013 Darko Brodić et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Brodić, Darko Milivojević, Zoran N. Maluckov, Čedomir A. Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
title	Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
title_full	Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
title_fullStr	Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
title_full_unstemmed	Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
title_short	Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
title_sort	recognition of the script in serbian documents using frequency occurrence and co-occurrence analysis
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3872444/ https://www.ncbi.nlm.nih.gov/pubmed/24385887 http://dx.doi.org/10.1155/2013/896328
work_keys_str_mv	AT brodicdarko recognitionofthescriptinserbiandocumentsusingfrequencyoccurrenceandcooccurrenceanalysis AT milivojeviczorann recognitionofthescriptinserbiandocumentsusingfrequencyoccurrenceandcooccurrenceanalysis AT maluckovcedomira recognitionofthescriptinserbiandocumentsusingfrequencyoccurrenceandcooccurrenceanalysis

Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis

Ejemplares similares