Cargando…

An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling

Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Devi, R. Suganya, Manjula, D., Siddharth, R. K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4475586/ https://www.ncbi.nlm.nih.gov/pubmed/26137592 http://dx.doi.org/10.1155/2015/739286

_version_	1782377478164054016
author	Devi, R. Suganya Manjula, D. Siddharth, R. K.
author_facet	Devi, R. Suganya Manjula, D. Siddharth, R. K.
author_sort	Devi, R. Suganya
collection	PubMed
description	Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling.
format	Online Article Text
id	pubmed-4475586
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-44755862015-07-01 An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling Devi, R. Suganya Manjula, D. Siddharth, R. K. ScientificWorldJournal Research Article Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling. Hindawi Publishing Corporation 2015 2015-06-07 /pmc/articles/PMC4475586/ /pubmed/26137592 http://dx.doi.org/10.1155/2015/739286 Text en Copyright © 2015 R. Suganya Devi et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Devi, R. Suganya Manjula, D. Siddharth, R. K. An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_full	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_fullStr	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_full_unstemmed	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_short	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_sort	efficient approach for web indexing of big data through hyperlinks in web crawling
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4475586/ https://www.ncbi.nlm.nih.gov/pubmed/26137592 http://dx.doi.org/10.1155/2015/739286
work_keys_str_mv	AT devirsuganya anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT manjulad anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT siddharthrk anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT devirsuganya efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT manjulad efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT siddharthrk efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling

An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling

Ejemplares similares