山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (05): 23-29.doi: 10.6040/j.issn.1671-9352.0.2014.306
赵官宝, 刘云
ZHAO Guan-bao, LIU Yun
摘要: 在数据挖掘技术中,关联规则挖掘的关键在于快速、准确地挖掘频繁项集。传统的Apriori类算法在挖掘频繁项集时存在扫描整个事务数据库的次数较多、频繁项集挖掘时间较长的问题。基于位表提出了频繁项集挖掘算法BITXOR,用二进制序列表示项集,通过异或运算判断两个项集是否能连接;在项集连接后,BITXOR算法对初始候选项集进行剪枝操作。仿真结果表明,在相同条件下,与传统的Apriori、FP-growth算法相比,BITXOR算法仅需扫描整个事务数据库一次,频繁项集的挖掘时间明显减少,在密集型数据库条件下性能表现更加显著。
中图分类号:
[1] HAN Jiawei, PEI Jian, YIN Yiwen. Mining frequent patterns without candidate generation[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. New York: ACM, 2000: 1-12. [2] DONG Jie, HAN Min. BitTableFI: an efficient mining frequent itemsets algorithm[J]. Knowledge-Based Systems, 2007, 20:329-335. [3] ZHANG Yan, ZHANG Fan, BAKOS J. Frequent itemset mining on large scale shared memory machines[C]//Proceedings of IEEE International Conference on Cluster Computing.Washington: IEEE Computer Society, 2011: 585-589. [4] 张志刚,黄刘生,金宗安,等. 基于父子等价剪枝策略的最大频繁项集挖掘[J].计算机工程,2013,39(4):219-221,225. ZHANG Zhigang, HUANG Liusheng, JIN Zongan, et al. Maximal frequent itemset mining based on parent-child equivalency pruning strategy[J]. Computer Engineering, 2013, 39(4):219-221,225. [5] TRAN A N, DUONG H V, TRUONG T C, et al. Efficient algorithms for mining frequent itemsets with constraint[C]//Proceedings of the 3rd International Conference on Knowledge and Systems Engineering (KSE). Washington: IEEE Computer Society, 2011: 19- 25. [6] 刘彩苹,毛建频,毛建旭,等.基于格的快速频繁项集挖掘算法[J]. 湖南大学学报:自然科学版,2013,40(10):52-57. LIU Caiping, MAO Jianpin, MAO Jianxu, et al. Lattice-based algorithm for fast mining frequent itemsets[J]. Journal of Hunan University: Natural Sciences, 2013, 40(10):52-57. [7] TRAN A T, NGO T P, NGUYEN K A. An efficient algorithm for discovering maximum length frequent itemsets[C]//Proceedings of the 3rd International Conference on Knowledge and Systems Engineering (KSE). Washington: IEEE Computer Society, 2011: 62-69. [8] SURIYA S, SHANTHARAJAH S P. A hybrid k-DCI and Apriori algorithm for mining frequent itemsets[C]//Proceedings of 2013 International Conference on Circuits, Power and Computing Technologies (ICCPCT). Washington: IEEE Computer Society, 2013: 1059-1064. [9] 张步忠,程玉胜,王则林.基于片上多核的频繁项集并行挖掘算法[J].计算机科学,2014,41(3):55-58. ZHANG Buzhong, CHENG Yusheng, WANG Zelin. Frequent itemset mining parallel algorithm based on chip multi-core[J]. Computer Science, 2014, 41(3):55-58. [10] VO B, COENEN F, LE T, et al. A hybrid Approach for mining frequent itemsets[C]//Proceedings of 2013 IEEE International Conference on Systems, Man, and Cybernetics. Washington: IEEE Computer Society, 2013: 4647-4651. |
[1] | 王开荣,马琳. 广义几何规划的加速全局优化算法[J]. J4, 2013, 48(1): 72-77. |
[2] | 李贵,韩子扬,郑新录,李征宇. 基于Apriori算法的Deep Web网页关系挖掘研究[J]. J4, 2011, 46(5): 67-70. |
[3] | 张文东1,尹金焕1,贾晓飞2,黄超1,苑衍梅1. 基于向量的频繁项集挖掘算法研究[J]. J4, 2011, 46(3): 31-34. |
[4] | 娄兰芳,潘庆先 . 基于集合运算的频繁集挖掘优化算法[J]. J4, 2008, 43(11): 54-57 . |
[5] | 刘兴涛,石 冰,解英文 . 挖掘关联规则中Apriori算法的一种改进[J]. J4, 2008, 43(11): 67-71 . |
[6] | 郭跃斌,翟延富,董祥军*,杨越越,李 刚 . 基于序列模式的正负关联规则研究[J]. J4, 2007, 42(9): 88-90 . |
[7] | 陈 华,陆黎明,刘玉文 . 基于Web数据挖掘的文献个性化推荐系统的设计[J]. J4, 2007, 42(11): 69-72 . |
[8] | 宋春芳,石冰 . 一种基于关联规则的搜索引擎结果聚类算法[J]. J4, 2006, 41(3): 61-65 . |
|