Cargando…
An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to t...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4475586/ https://www.ncbi.nlm.nih.gov/pubmed/26137592 http://dx.doi.org/10.1155/2015/739286 |
_version_ | 1782377478164054016 |
---|---|
author | Devi, R. Suganya Manjula, D. Siddharth, R. K. |
author_facet | Devi, R. Suganya Manjula, D. Siddharth, R. K. |
author_sort | Devi, R. Suganya |
collection | PubMed |
description | Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling. |
format | Online Article Text |
id | pubmed-4475586 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-44755862015-07-01 An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling Devi, R. Suganya Manjula, D. Siddharth, R. K. ScientificWorldJournal Research Article Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling. Hindawi Publishing Corporation 2015 2015-06-07 /pmc/articles/PMC4475586/ /pubmed/26137592 http://dx.doi.org/10.1155/2015/739286 Text en Copyright © 2015 R. Suganya Devi et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Devi, R. Suganya Manjula, D. Siddharth, R. K. An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling |
title | An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling |
title_full | An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling |
title_fullStr | An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling |
title_full_unstemmed | An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling |
title_short | An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling |
title_sort | efficient approach for web indexing of big data through hyperlinks in web crawling |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4475586/ https://www.ncbi.nlm.nih.gov/pubmed/26137592 http://dx.doi.org/10.1155/2015/739286 |
work_keys_str_mv | AT devirsuganya anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT manjulad anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT siddharthrk anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT devirsuganya efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT manjulad efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT siddharthrk efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling |