J4

• Articles • Previous Articles     Next Articles

Improved Shark-Search algorithm based on page segmentation

CHEN Jun1,CHEN Zhu-min2   

  1. 1. Network Center, Shandong University, Jinan 250100, Shandong;2. School of Computer Science and Technology, Shandong University, Jinan 250061, Shandong
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-24 Published:2006-10-24
  • Contact: CHEN Jun

Abstract: A Shark-Search algorithm is one of the classical algorithms for focused crawling. However, its performance is not ideal for crawling Web pages which contain too many noisy links. An improved Shark-Search algorithm based on page segmentation was proposed, which can accurately evaluate the relevance from three granularities: page, block and single link. Several experiments were carried out to verify that the improved Shark-Search algorithm can obtain significantly higher efficiency than traditional ones.

Key words: relevance computation , page segmentation, focused crawling, Shark-Search algorithm

CLC Number: 

  • TP391
[1] FAN Yi-xing, GUO Yan, LI Xi-peng, ZHAO Ling, LIU Yue, YU Xiao-ming, CHENG Xue-qi. A multi-level page clustering method based on page segmentation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 1-8.
[2] WANG Jing,YAO Yong,LIU Zhi-jing . Web information extraction based on a generalized hidden Markov model [J]. J4, 2007, 42(11): 49-52 .
[3] SU Qi,XIANG Kun and SUN Bin . The Shark-Search algorithm based on clustering links [J]. J4, 2006, 41(3): 1-04 .
[4] WU Peng-fei,MENG Xiang-zeng,LIU Jun-xiao,MA Feng-juan . Structure and content-based extraction of topical information from Web pages [J]. J4, 2006, 41(3): 131-134 .
[5] DUAN Xin,MA Jun,SONG Ling . The study of Chinese Webpage classification based on block importance [J]. J4, 2006, 41(3): 108-111 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!