Cargando…

The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction

A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while th...

Descripción completa

Detalles Bibliográficos
Autores principales: Najafi, Elham, Darooneh, Amir H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4474631/
https://www.ncbi.nlm.nih.gov/pubmed/26091207
http://dx.doi.org/10.1371/journal.pone.0130617
_version_ 1782377307107753984
author Najafi, Elham
Darooneh, Amir H.
author_facet Najafi, Elham
Darooneh, Amir H.
author_sort Najafi, Elham
collection PubMed
description A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction.
format Online
Article
Text
id pubmed-4474631
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44746312015-06-30 The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction Najafi, Elham Darooneh, Amir H. PLoS One Research Article A text can be considered as a one dimensional array of words. The locations of each word type in this array form a fractal pattern with certain fractal dimension. We observe that important words responsible for conveying the meaning of a text have dimensions considerably different from one, while the fractal dimensions of unimportant words are close to one. We introduce an index quantifying the importance of the words in a given text using their fractal dimensions and then ranking them according to their importance. This index measures the difference between the fractal pattern of a word in the original text relative to a shuffled version. Because the shuffled text is meaningless (i.e., words have no importance), the difference between the original and shuffled text can be used to ascertain degree of fractality. The degree of fractality may be used for automatic keyword detection. Words with the degree of fractality higher than a threshold value are assumed to be the retrieved keywords of the text. We measure the efficiency of our method for keywords extraction, making a comparison between our proposed method and two other well-known methods of automatic keyword extraction. Public Library of Science 2015-06-19 /pmc/articles/PMC4474631/ /pubmed/26091207 http://dx.doi.org/10.1371/journal.pone.0130617 Text en © 2015 Najafi, Darooneh http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Najafi, Elham
Darooneh, Amir H.
The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
title The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
title_full The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
title_fullStr The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
title_full_unstemmed The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
title_short The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction
title_sort fractal patterns of words in a text: a method for automatic keyword extraction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4474631/
https://www.ncbi.nlm.nih.gov/pubmed/26091207
http://dx.doi.org/10.1371/journal.pone.0130617
work_keys_str_mv AT najafielham thefractalpatternsofwordsinatextamethodforautomatickeywordextraction
AT daroonehamirh thefractalpatternsofwordsinatextamethodforautomatickeywordextraction
AT najafielham fractalpatternsofwordsinatextamethodforautomatickeywordextraction
AT daroonehamirh fractalpatternsofwordsinatextamethodforautomatickeywordextraction