JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (7): 11-17.doi: 10.6040/j.issn.1671-9352.1.2015.060
Previous Articles Next Articles
LIU Chi, YAN Hong-fei
CLC Number:
[1] RISTAD E S, YIANILOS P N. Learning string-edit distance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(5):522-532. [2] ELMAGARMID A K, IPEIROTIS P G, VERYKIOS V S. Duplicate record detection: a survey[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1):1-16. [3] GIBSON D, PUNERA K, TOMKINS A. The volume and evolution of web page templates[C] //Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. New York: ACM, 2005: 830-839. [4] FETTERLY D, MANASSE M, NAJORK M. Detecting phrase-level duplication on the world wide web[C] //Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2005: 170-177. [5] MANKU G S, JAIN A, DAS S A. Detecting near-duplicates for web crawling[C] //Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 141-150. [6] 陈基漓,牛秦洲.基于特征码的网页去重[J]. 微计算机信息, 2006, 22(9):113-115. CHEN Jili, NIU Qinzhou. Page to weight based on feature code[J]. Micro Computer Information, 2006, 22(9):113-115. [7] 黄仁,冯胜,杨吉云,等.基于正文结构和长句提取的网页去重算法[J].计算机应用研究, 2010, 27(7):2489-2491. HUANG Ren, FENG Sheng, YANG Jiyun, et al. Descreen algorithm based on text structure and extraction of long sentences[J]. Computer Application Research, 2010, 27(7): 2489-2491. [8] 闫俊伢.基于MD5的网页去重算法的设计与研究[J]. 实验室研究与探索, 2013, 32(12):105-108. YAN Junya. Design and research of elimination algorithm based on MD5 web page[J]. Laboratory Research and Exploration, 2013, 32(12):105-108. [9] 熊忠阳,牙漫,张玉芳,等.基于网页正文结构和特征串的相似网页去重算法[J].计算机应用,2013,33(2):554-557. XIONG Zhongyang, YA Man, ZHANG Yufang, et al. Based on Web page text structure and characteristic string of similar web page to weight algorithm[J]. Computer Application, 2013, 33(2):554-557. [10] 徐朝辉,赵淑梅,闫付亮,等.一种基于特征向量的改进DSC网页去重算法[J].科学技术与工程,2013,13(8):2250-2253. XU Chaohui, ZHAO Shumei, YAN Fuliang, et al. An improved DSC page de weight algorithm based on feature vectors[J]. Science Technology and Engineering, 2013, 13(8):2250-2253. [11] 曹玉娟,牛振东,赵堃,等.基于概念和语义网络的近似网页检测算法[J].软件学报, 2011, 22(8):1816-1826. CAO Yujuan, NIU Zhendong, ZHAO Kun, et al. Approximate web page detection algorithm based on concept and semantic web[J]. Journal of Software, 2011, 22(8):1816-1826. [12] 张玉连,王莎莎,宋桂江,等.基于元搜索的网页去重算法[J].燕山大学学报, 2011, 35(2):121-123. ZHANG Yulian, WANG Shasha, SONG Guijiang, et al. A meta search based algorithm for page weight[J]. Journal of Yanshan University, 2011, 35(2):121-123. [13] 葛晓玢,刘杰,崔健,等.基于版权信息的新闻网页去重策略研究[J].电脑知识与技术, 2012, 8(26):6211-6214. GE Xiaofen, LIU Jie, CUI Jian, et al. Research on the strategy of news web page based on copyright information[J]. Computer Knowledge and Technology, 2012, 8(26):6211-6214. [14] DALVI N, OLTEANU M, RAGHAVAN M, et al. Deduplicating a places database[C] //Proceedings of the 23rd International Conference on World Wide Web. New York: ACM, 2014:409-418. [15] HENZINGER M. Finding near-duplicate web pages: a large-scale evaluation of algorithms[C] //Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2006: 284-291. [16] 王开军, 李健, 张军英,等. 聚类分析中类数估计方法的实验比较[J]. 计算机工程, 2008, 34(9):198-199. WANG Kaijun, LI Jian, ZHANG Junying, et al. An experimental comparison of the methods of class number estimation in cluster analysis[J]. Computer Engineering, 2008, 34(9):198-199. |
[1] | ZHANG Nai-zhou1, CAO Wei 2, CHEN Ke-rui 1, LI Shi-jun3. A temporal-aware model for search engine [J]. J4, 2013, 48(11): 80-86. |
[2] | LIU Xiao-hua1,2, WEI Fu-ru2, DUAN Ya-juan3, ZHOU Ming2. Semantic search of microblogs [J]. J4, 2012, 47(5): 38-42. |
[3] | ZENG Jian-ping, WU Cheng-rong, GONG Ling-hui. Algorithm of dynamic maintaince of index library for a distributed search engine [J]. J4, 2011, 46(5): 24-27. |
[4] | LI Zhi-chao, YU Hui-jia, LIU Yi-qun, MA Shao-ping. A survey of web spam and anti-spam techniques [J]. J4, 2011, 46(5): 1-8. |
[5] | SONG Chun-fang,SHI Bing . An algorithm to cluster the search results basedon the association rules [J]. J4, 2006, 41(3): 61-65 . |
[6] | ZHANG Yu,YUAN Fang . A user interest modelbased personalized information [J]. J4, 2006, 41(3): 120-125 . |
|