Cargando…

An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling

Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to t...

Descripción completa

Detalles Bibliográficos
Autores principales: Devi, R. Suganya, Manjula, D., Siddharth, R. K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4475586/
https://www.ncbi.nlm.nih.gov/pubmed/26137592
http://dx.doi.org/10.1155/2015/739286
_version_ 1782377478164054016
author Devi, R. Suganya
Manjula, D.
Siddharth, R. K.
author_facet Devi, R. Suganya
Manjula, D.
Siddharth, R. K.
author_sort Devi, R. Suganya
collection PubMed
description Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling.
format Online
Article
Text
id pubmed-4475586
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-44755862015-07-01 An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling Devi, R. Suganya Manjula, D. Siddharth, R. K. ScientificWorldJournal Research Article Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling. Hindawi Publishing Corporation 2015 2015-06-07 /pmc/articles/PMC4475586/ /pubmed/26137592 http://dx.doi.org/10.1155/2015/739286 Text en Copyright © 2015 R. Suganya Devi et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Devi, R. Suganya
Manjula, D.
Siddharth, R. K.
An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_full An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_fullStr An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_full_unstemmed An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_short An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_sort efficient approach for web indexing of big data through hyperlinks in web crawling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4475586/
https://www.ncbi.nlm.nih.gov/pubmed/26137592
http://dx.doi.org/10.1155/2015/739286
work_keys_str_mv AT devirsuganya anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling
AT manjulad anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling
AT siddharthrk anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling
AT devirsuganya efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling
AT manjulad efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling
AT siddharthrk efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling