JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2015, Vol. 50 ›› Issue (05): 1-6.doi: 10.6040/j.issn.1671-9352.3.2014.155

    Next Articles

An improved genetic algorithm in the application of Web spider

ZHANG Jing1,2, XIAO Zhi-bin1,2, RONG Hui3, CUI Yi4   

  1. 1. Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650500, Yunnan, China;
    2. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China;
    3. Faculty of Art and Design, Kunming Metallurgy College, Kunming 650033, Yunnan, China;
    4. Yunnan Academy of Scientific and Technical Information, Kunming 650051, Yunnan, China
  • Received:2014-09-19 Online:2015-05-20 Published:2015-05-29

Abstract: In order to further raise the information acquisition efficiency of Web spider in the Internet, IoT and industrial real-time control networks, an analysis of the causes that lead Web spider into the local optimum was presented. Meanwhile, the genetic algorithm (GA) for finding the global optimum was introduced in the application of Web spider. To avoid the slow convergence rate and premature convergence of the pure GA, an improved algorithm was proposed, which refined selection, crossover, and mutation of these three basic operators of GA. The experiment results show that this algorithm overcomes the above problems in combination with Web spider. Meanwhile, the recall ratio and retrieval precision are both increased.

Key words: Web spider, genetic algorithm, premature convergence, self-adaptive

CLC Number: 

  • TP18
[1] 唐志,王成良.遗传算法在主题Web信息采集中的应用研究[J].计算机科学, 2006,33(7):71-74. TANG Zhi, WANG Chengliang.Research of a focused crawler using genetic algorithm[J].Computer Science, 2006, 33(7):71-74.
[2] 张玲,秦拯,易先卉.基于遗传算法的Web信息采集策略研究[J].情报理论与实践,2008,31(2):303-306. ZHANG Ling, QIN Cheng, YI Xianhui. Research on Web information collection strategy based on genetic algorithm[J]. Information Studies: Theory and Application, 2008, 31(2):303-306.
[3] SRINIVASAN P, MENCZER F, PANT G.A general evaluation framework for topical crawlers[J].Information Retrieval, 2005, 8(3):417-447.
[4] 林海霞,原福永,陈金森,等.一种改进的主题网络蜘蛛搜索算法[J].计算机工程与应用,2007,43(10):174-176. LIN Haixia, YUAN Fuyong, CHEN Jinsen, et al.Improved algorithm about topic web crawler's search strategy[J].Computer Engineering and Applications, 2007, 43(10):174-176.
[5] 李学勇,田立军,谭义红,等.一种基于非贪婪策略的网络蜘蛛搜索算法[J].计算技术与自动化,2004,23(2):35-39. LI Xueyong, TIAN Lijun, TAN Yihong, et al. A Web spider's searching algorithm based on-greedy policy[J]. Computing Technology and Automation, 2004, 23(2):35-39.
[6] BERGMARK D, LAGOZE C, SBITYAKOV A. Focused crawl, tunneling, and digital libraries[C]//Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries. Berlin: Springer-Verlag, 2002:91-106.
[7] 任海艳,陈飞翔.自适应遗传算法的改进及在曲线化简中的应用[J].计算机工程与应用,2012,48(11):152-155. REN Haiyan, CHEN Feixiang. Improvement of adaptive genetic algorithms and application in line simplification[J].Computer Engineering and Applications, 2012, 48(11):152-155.
[8] 曹椋焱,李光布,李景辉.遗传算法的分析及其改进[J].计算机仿真,2009,26(7):228-231. CAO Liangyan, LI Guangbu, LI Jinghui. Analysis of genetic algorithm and its modification[J].Computer Simulation, 2009, 26(7):228-231.
[9] 张建彬,陈抱雪,隋国荣,等.智能交叉算子遗传算法的新机制[J].计算机工程与应用,2009,45(32):35-37. ZHANG Jianbin, CHEN Baoxue, SUI Guorong, et al. New mechanism of GA based on intelligent crossover[J].Computer Engineering and Applications, 2009, 45(32):35-37.
[10] 张国强,彭晓明.自适应遗传算法的改进与应用[J].舰船电子工程,2010,30(1):83-85. ZHANG Guoqiang, PENG Xiaoming. Improvement and application of an improved adaptive genetic algorithm[J]. Ship Electronic Engineering, 2010, 30(1):83-85.
[11] 张京钊,江涛.改进的自适应遗传算法[J].计算机工程与应用,2010,46(11):53-55. ZHANG Jingzhao, JIANG Tao. Improved adaptive genetic algorithm[J].Computer Engineering and Applications, 2010, 46(11):53-55.
[12] 梁影,金铭,乔晓林.一种改进的遗传算法[J].科学技术与工程,2012,12(15):3636-3639, 3644. LIANG Ying, JIN Ming, QIAO Xiaolin. Improved genetic algorithm[J]. Science Technology and Engineering, 2012, 12(15):3636-3639, 3644.
[13] 林锐浩,陈晓龙.基于种群多样性指导的遗传算法[J].计算机工程与设计,2005,26(11):3100-3102. LIN Ruihao, CHEN Xiaolong. Genetic algorithm based on instructing by population diversity[J]. Computer Engineering and Design, 2005, 26(11):3100-3102.
[14] 段玉倩,贺家李.遗传算法及其改进[J]. 电力系统及其自动化学报,1998,10(1):39-52. DUAN Yuqian, HE Jiali. Genetic algorithm and its improved[J]. Proceedings of the CSU-EPSA, 1998, 10(1):39-52.
[15] 帅训波,马书南,邵艳伟,等.基于两种新型遗传算子的优化组合遗传算法[J].计算机系统应用,2010,19(7):98-102. SHUAI Xunbo, MA Shunan, SHAO Yanwei, et al. Optimization combination genetic algorithm based on two new operators[J].Computer System and Applications, 2010, 19(7):98-102.
[16] SRINIVAS M, PATNAIK L M. Adaptive probabilities of crossover and mutation in genetic algorithm[J]. IEEE Transactions on Systems, Man And Cybernetics, 1994, 24(4):656-666.
[17] SALTON G, MCGILL M J. Introduction to modern information retrieval[J]. Journal of the American Society for Information Science, 1983, 41:288-297.
[1] DU Xiao-jun, LIN Bo-gang, LIN Zhi-yuan, LI Ying. Research on multiple population genetic algorithm in security software fuzzy test [J]. J4, 2013, 48(7): 79-84.
[2] SUN Fei, WANG Peng-jun*, YU Hai-zhen, WANG Di-sheng. Ternary FPRM circuit area optimization based on genetic algorithm [J]. J4, 2013, 48(05): 51-56.
[3] MA Yu-hong1,2, SUN Shu-fen2. A multi-product transport problem with transfer and  sent straight and its genetic algorithm [J]. J4, 2012, 47(7): 121-126.
[4] WU Da-hua, HE Zhen-feng*. Improvement of cluster-based genetic segmentation of time series algorithm [J]. J4, 2010, 45(7): 45-49.
[5] XU Min-Li, SUN Cai-Qun. A multi-location inventory model for service parts with lateral transshipment and waiting time constraints [J]. J4, 2010, 45(3): 61-65.
[6] DING Ran, LI Qi-Qiang, LIANG Tao. Short-term scheduling formulation with decomposition structurefor multi-purpose batch plants [J]. J4, 2010, 45(1): 73-79.
[7] LIU Bing, LIU Wei-Ji, YANG Guo-Sheng. QSRR of alkyl-nitrophenols in gas chromatography by genetic algorithms [J]. J4, 2009, 44(9): 8-11.
[8] . The optimized algorithm of optical threedimensional 
measurement for phaseshift technique
[J]. J4, 2009, 44(6): 40-45.
[9] . Researches on post VRPTW based on genetic algorithm [J]. J4, 2009, 44(6): 46-50.
[10] HE Ai-xiang,ZHANG Yong . Classification rules for mining tumors and normal tissues using genetic algorithms and decision trees [J]. J4, 2007, 42(9): 91-95 .
[11] HONG Xiao-fang,CHEN Di* and WU Shi-jun . Design of an active RC filter based on game theory and genetic algorithm [J]. J4, 2007, 42(5): 30-33 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!