您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (05): 1-6.doi: 10.6040/j.issn.1671-9352.3.2014.155

• 论文 •    下一篇

改进型遗传算法在网络蜘蛛上的应用

张晶1,2, 肖智斌1,2, 容会3, 崔毅4   

  1. 1. 昆明理工大学云南省计算机技术应用重点实验室, 云南 昆明 650500;
    2. 昆明理工大学信息工程与自动化学院, 云南 昆明 650500;
    3. 昆明冶金高等专科学校艺术设计学院, 云南 昆明 650033;
    4. 云南省科技情报研究院, 云南 昆明 650051
  • 收稿日期:2014-09-19 出版日期:2015-05-20 发布日期:2015-05-29
  • 作者简介:张晶(1974-),男,博士,教授,研究方向为实时与嵌入式软件、物联网软件建模与设计、信息物理融合系统.E-mail:zhangji0548_cn@sina.com
  • 基金资助:
    云南省自然科学基金资助项目(2014FA029,2012FB137,2011FZ202)

An improved genetic algorithm in the application of Web spider

ZHANG Jing1,2, XIAO Zhi-bin1,2, RONG Hui3, CUI Yi4   

  1. 1. Computer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming 650500, Yunnan, China;
    2. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China;
    3. Faculty of Art and Design, Kunming Metallurgy College, Kunming 650033, Yunnan, China;
    4. Yunnan Academy of Scientific and Technical Information, Kunming 650051, Yunnan, China
  • Received:2014-09-19 Online:2015-05-20 Published:2015-05-29

摘要: 为了进一步提高网络蜘蛛在互联网、物联网和实时工业控制网络中信息采集的效率,分析了导致网络蜘蛛陷入局部最优解的原因,将遗传算法引入到网络蜘蛛的应用当中求解全局最优解,针对传统遗传算法中存在早熟和收敛慢的问题对选择、交叉、变异这三种核心算子进行了改进。经实验对比表明,该算法和网络蜘蛛相结合克服了以上问题,具有较高的搜索查全率和搜索准确率。

关键词: 网络蜘蛛, 早熟, 自适应, 遗传算法

Abstract: In order to further raise the information acquisition efficiency of Web spider in the Internet, IoT and industrial real-time control networks, an analysis of the causes that lead Web spider into the local optimum was presented. Meanwhile, the genetic algorithm (GA) for finding the global optimum was introduced in the application of Web spider. To avoid the slow convergence rate and premature convergence of the pure GA, an improved algorithm was proposed, which refined selection, crossover, and mutation of these three basic operators of GA. The experiment results show that this algorithm overcomes the above problems in combination with Web spider. Meanwhile, the recall ratio and retrieval precision are both increased.

Key words: Web spider, genetic algorithm, premature convergence, self-adaptive

中图分类号: 

  • TP18
[1] 唐志,王成良.遗传算法在主题Web信息采集中的应用研究[J].计算机科学, 2006,33(7):71-74. TANG Zhi, WANG Chengliang.Research of a focused crawler using genetic algorithm[J].Computer Science, 2006, 33(7):71-74.
[2] 张玲,秦拯,易先卉.基于遗传算法的Web信息采集策略研究[J].情报理论与实践,2008,31(2):303-306. ZHANG Ling, QIN Cheng, YI Xianhui. Research on Web information collection strategy based on genetic algorithm[J]. Information Studies: Theory and Application, 2008, 31(2):303-306.
[3] SRINIVASAN P, MENCZER F, PANT G.A general evaluation framework for topical crawlers[J].Information Retrieval, 2005, 8(3):417-447.
[4] 林海霞,原福永,陈金森,等.一种改进的主题网络蜘蛛搜索算法[J].计算机工程与应用,2007,43(10):174-176. LIN Haixia, YUAN Fuyong, CHEN Jinsen, et al.Improved algorithm about topic web crawler's search strategy[J].Computer Engineering and Applications, 2007, 43(10):174-176.
[5] 李学勇,田立军,谭义红,等.一种基于非贪婪策略的网络蜘蛛搜索算法[J].计算技术与自动化,2004,23(2):35-39. LI Xueyong, TIAN Lijun, TAN Yihong, et al. A Web spider's searching algorithm based on-greedy policy[J]. Computing Technology and Automation, 2004, 23(2):35-39.
[6] BERGMARK D, LAGOZE C, SBITYAKOV A. Focused crawl, tunneling, and digital libraries[C]//Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries. Berlin: Springer-Verlag, 2002:91-106.
[7] 任海艳,陈飞翔.自适应遗传算法的改进及在曲线化简中的应用[J].计算机工程与应用,2012,48(11):152-155. REN Haiyan, CHEN Feixiang. Improvement of adaptive genetic algorithms and application in line simplification[J].Computer Engineering and Applications, 2012, 48(11):152-155.
[8] 曹椋焱,李光布,李景辉.遗传算法的分析及其改进[J].计算机仿真,2009,26(7):228-231. CAO Liangyan, LI Guangbu, LI Jinghui. Analysis of genetic algorithm and its modification[J].Computer Simulation, 2009, 26(7):228-231.
[9] 张建彬,陈抱雪,隋国荣,等.智能交叉算子遗传算法的新机制[J].计算机工程与应用,2009,45(32):35-37. ZHANG Jianbin, CHEN Baoxue, SUI Guorong, et al. New mechanism of GA based on intelligent crossover[J].Computer Engineering and Applications, 2009, 45(32):35-37.
[10] 张国强,彭晓明.自适应遗传算法的改进与应用[J].舰船电子工程,2010,30(1):83-85. ZHANG Guoqiang, PENG Xiaoming. Improvement and application of an improved adaptive genetic algorithm[J]. Ship Electronic Engineering, 2010, 30(1):83-85.
[11] 张京钊,江涛.改进的自适应遗传算法[J].计算机工程与应用,2010,46(11):53-55. ZHANG Jingzhao, JIANG Tao. Improved adaptive genetic algorithm[J].Computer Engineering and Applications, 2010, 46(11):53-55.
[12] 梁影,金铭,乔晓林.一种改进的遗传算法[J].科学技术与工程,2012,12(15):3636-3639, 3644. LIANG Ying, JIN Ming, QIAO Xiaolin. Improved genetic algorithm[J]. Science Technology and Engineering, 2012, 12(15):3636-3639, 3644.
[13] 林锐浩,陈晓龙.基于种群多样性指导的遗传算法[J].计算机工程与设计,2005,26(11):3100-3102. LIN Ruihao, CHEN Xiaolong. Genetic algorithm based on instructing by population diversity[J]. Computer Engineering and Design, 2005, 26(11):3100-3102.
[14] 段玉倩,贺家李.遗传算法及其改进[J]. 电力系统及其自动化学报,1998,10(1):39-52. DUAN Yuqian, HE Jiali. Genetic algorithm and its improved[J]. Proceedings of the CSU-EPSA, 1998, 10(1):39-52.
[15] 帅训波,马书南,邵艳伟,等.基于两种新型遗传算子的优化组合遗传算法[J].计算机系统应用,2010,19(7):98-102. SHUAI Xunbo, MA Shunan, SHAO Yanwei, et al. Optimization combination genetic algorithm based on two new operators[J].Computer System and Applications, 2010, 19(7):98-102.
[16] SRINIVAS M, PATNAIK L M. Adaptive probabilities of crossover and mutation in genetic algorithm[J]. IEEE Transactions on Systems, Man And Cybernetics, 1994, 24(4):656-666.
[17] SALTON G, MCGILL M J. Introduction to modern information retrieval[J]. Journal of the American Society for Information Science, 1983, 41:288-297.
[1] 晏燕,郝晓弘. 差分隐私密度自适应网格划分发布方法[J]. 山东大学学报(理学版), 2018, 53(9): 12-22.
[2] 康海燕,黄渝轩,陈楚翘. 基于视频分析的地理信息隐私保护方法[J]. 山东大学学报(理学版), 2018, 53(1): 19-29.
[3] 宋元章,李洪雨,陈媛,王俊杰. 基于分形与自适应数据融合的P2P botnet检测方法[J]. 山东大学学报(理学版), 2017, 52(3): 74-81.
[4] 黄伟婷,赵红,祝峰. 代价敏感属性约简的自适应分治算法[J]. 山东大学学报(理学版), 2016, 51(8): 98-104.
[5] 姚亮,洪宇,刘昊,刘乐,姚建民. 基于语义分布相似度的翻译模型领域自适应研究[J]. 山东大学学报(理学版), 2016, 51(7): 43-50.
[6] 葛彦强,汪向征. 一种改进的自适应和声搜索优化算法[J]. 山东大学学报(理学版), 2016, 51(1): 84-88.
[7] 刘春梅, 钟柳强, 舒适, 肖映雄. 平面弹性问题的高次有限元离散系统的局部多重网格法[J]. 山东大学学报(理学版), 2015, 50(08): 34-39.
[8] 杨叶红,肖剑*,马珍珍. 一个新分数阶混沌系统的同步和控制[J]. 山东大学学报(理学版), 2014, 49(2): 76-83.
[9] 杜晓军,林柏钢,林志远,李应. 安全软件模糊测试中多种群遗传算法的研究[J]. J4, 2013, 48(7): 79-84.
[10] 孙飞,汪鹏君*,俞海珍,汪迪生. 基于遗传算法的三值FPRM电路面积优化[J]. J4, 2013, 48(05): 51-56.
[11] 吕小妮1,王艳彩2,高岳林2. BVaR风险度量下限制性卖空的单位风险收益最大投资组合模型[J]. J4, 2013, 48(05): 92-96.
[12] 马宇红1,2,孙淑芬2. 一个带中转和直销的多产品运输问题及其遗传算法[J]. J4, 2012, 47(7): 121-126.
[13] 周燕1,2,刘培玉1,2,赵静1,2,王乾龙1,2. 基于自适应惯性权重的混沌粒子群算法[J]. J4, 2012, 47(3): 27-32.
[14] 丁卫平1,2,3,王建东2,段卫华2,施佺1. 一种求解属性约简优化的协同粒子群算法[J]. J4, 2011, 46(5): 97-102.
[15] 吴大华,何振峰*. 对基于聚类和遗传算法的时间序列分割算法的改进[J]. J4, 2010, 45(7): 45-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!