您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (05): 23-29.doi: 10.6040/j.issn.1671-9352.0.2014.306

• 论文 • 上一篇    下一篇

一种基于位表的有效频繁项集挖掘算法

赵官宝, 刘云   

  1. 昆明理工大学信息工程与自动化学院, 云南 昆明 650500
  • 收稿日期:2014-07-04 出版日期:2015-05-20 发布日期:2015-05-29
  • 通讯作者: 刘云(1973-),男,副教授,研究方向为无线通信.E-mail:liuyun@kmust.edu.cn E-mail:liuyun@kmust.edu.cn
  • 作者简介:赵官宝(1989-),男,硕士研究生,研究方向为数据挖掘、无线传感网.E-mail:774699334@qq.com
  • 基金资助:
    国家自然基金资助项目(61262040)

An efficient bittable based frequent itemsets mining algorithm

ZHAO Guan-bao, LIU Yun   

  1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Received:2014-07-04 Online:2015-05-20 Published:2015-05-29

摘要: 在数据挖掘技术中,关联规则挖掘的关键在于快速、准确地挖掘频繁项集。传统的Apriori类算法在挖掘频繁项集时存在扫描整个事务数据库的次数较多、频繁项集挖掘时间较长的问题。基于位表提出了频繁项集挖掘算法BITXOR,用二进制序列表示项集,通过异或运算判断两个项集是否能连接;在项集连接后,BITXOR算法对初始候选项集进行剪枝操作。仿真结果表明,在相同条件下,与传统的Apriori、FP-growth算法相比,BITXOR算法仅需扫描整个事务数据库一次,频繁项集的挖掘时间明显减少,在密集型数据库条件下性能表现更加显著。

关键词: 频繁项集, 剪枝, 位表, 关联规则, 异或

Abstract: Fast and accurate mining frequent item set is the key of mining association rules in data mining techniques. The traditional Apriori-like algorithms need to scan the entire transaction database many times and spend too long time for mining frequent itemsets. Based on the efficient bittable, an based frequent itemsets mining algorithm BITXOR was proposed, which uses the bit table and represents item sets with binary sequence. BITXOR judges whether two items can be connected by the sequence of binary XOR. After the connection of item set, BITXOR also carries out pruning operation on the initial candidate sets. The simulation results show, compared with the traditional Apriori algorithm and FP-growth algorithm under the same conditions, BITXOR algorithm scans the entire transaction database only once and significantly reduces the time of mining frequent itemsets. The performances of BITXOR are more significant in conditions of intensive database.

Key words: association rules, frequent itemsets, XOR, pruning, bittable

中图分类号: 

  • TP274
[1] HAN Jiawei, PEI Jian, YIN Yiwen. Mining frequent patterns without candidate generation[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. New York: ACM, 2000: 1-12.
[2] DONG Jie, HAN Min. BitTableFI: an efficient mining frequent itemsets algorithm[J]. Knowledge-Based Systems, 2007, 20:329-335.
[3] ZHANG Yan, ZHANG Fan, BAKOS J. Frequent itemset mining on large scale shared memory machines[C]//Proceedings of IEEE International Conference on Cluster Computing.Washington: IEEE Computer Society, 2011: 585-589.
[4] 张志刚,黄刘生,金宗安,等. 基于父子等价剪枝策略的最大频繁项集挖掘[J].计算机工程,2013,39(4):219-221,225. ZHANG Zhigang, HUANG Liusheng, JIN Zongan, et al. Maximal frequent itemset mining based on parent-child equivalency pruning strategy[J]. Computer Engineering, 2013, 39(4):219-221,225.
[5] TRAN A N, DUONG H V, TRUONG T C, et al. Efficient algorithms for mining frequent itemsets with constraint[C]//Proceedings of the 3rd International Conference on Knowledge and Systems Engineering (KSE). Washington: IEEE Computer Society, 2011: 19- 25.
[6] 刘彩苹,毛建频,毛建旭,等.基于格的快速频繁项集挖掘算法[J]. 湖南大学学报:自然科学版,2013,40(10):52-57. LIU Caiping, MAO Jianpin, MAO Jianxu, et al. Lattice-based algorithm for fast mining frequent itemsets[J]. Journal of Hunan University: Natural Sciences, 2013, 40(10):52-57.
[7] TRAN A T, NGO T P, NGUYEN K A. An efficient algorithm for discovering maximum length frequent itemsets[C]//Proceedings of the 3rd International Conference on Knowledge and Systems Engineering (KSE). Washington: IEEE Computer Society, 2011: 62-69.
[8] SURIYA S, SHANTHARAJAH S P. A hybrid k-DCI and Apriori algorithm for mining frequent itemsets[C]//Proceedings of 2013 International Conference on Circuits, Power and Computing Technologies (ICCPCT). Washington: IEEE Computer Society, 2013: 1059-1064.
[9] 张步忠,程玉胜,王则林.基于片上多核的频繁项集并行挖掘算法[J].计算机科学,2014,41(3):55-58. ZHANG Buzhong, CHENG Yusheng, WANG Zelin. Frequent itemset mining parallel algorithm based on chip multi-core[J]. Computer Science, 2014, 41(3):55-58.
[10] VO B, COENEN F, LE T, et al. A hybrid Approach for mining frequent itemsets[C]//Proceedings of 2013 IEEE International Conference on Systems, Man, and Cybernetics. Washington: IEEE Computer Society, 2013: 4647-4651.
[1] 王开荣,马琳. 广义几何规划的加速全局优化算法[J]. J4, 2013, 48(1): 72-77.
[2] 李贵,韩子扬,郑新录,李征宇. 基于Apriori算法的Deep Web网页关系挖掘研究[J]. J4, 2011, 46(5): 67-70.
[3] 张文东1,尹金焕1,贾晓飞2,黄超1,苑衍梅1. 基于向量的频繁项集挖掘算法研究[J]. J4, 2011, 46(3): 31-34.
[4] 娄兰芳,潘庆先 . 基于集合运算的频繁集挖掘优化算法[J]. J4, 2008, 43(11): 54-57 .
[5] 刘兴涛,石 冰,解英文 . 挖掘关联规则中Apriori算法的一种改进[J]. J4, 2008, 43(11): 67-71 .
[6] 郭跃斌,翟延富,董祥军*,杨越越,李 刚 . 基于序列模式的正负关联规则研究[J]. J4, 2007, 42(9): 88-90 .
[7] 陈 华,陆黎明,刘玉文 . 基于Web数据挖掘的文献个性化推荐系统的设计[J]. J4, 2007, 42(11): 69-72 .
[8] 宋春芳,石冰 . 一种基于关联规则的搜索引擎结果聚类算法[J]. J4, 2006, 41(3): 61-65 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!