基于向量的频繁项集挖掘算法研究

J4 ›› 2011, Vol. 46 ›› Issue (3): 31-34.

基于向量的频繁项集挖掘算法研究

张文东¹,尹金焕¹,贾晓飞²,黄超¹,苑衍梅¹

1.中国石油大学(华东)计算机与通信工程学院, 山东东营 257061;
2.中海石油(中国)有限公司天津分公司渤海油田勘探开发研究院, 天津 300452

收稿日期:2010-03-28 发布日期:2011-04-21
作者简介:张文东(1963- ),男,高级工程师,硕士生导师,主要研究方向为数据库、数据挖掘. Email:zhangwend@126.com

Research of a frequent itemsets mining algorithm based on vector

ZHANG Wen-dong¹, YIN Jin-huan¹, JIA Xiao-fei², HUANG Chao¹, YUAN Yan-mei¹

1. School of Computer and Communication Engineering, China University of Petroleum(East China),
Dongying 257061, Shandong, China;
2. Bohai Oilfield Exploration and Development Research Institute, Tianjin Branch of CNOOC Ltd., Tianjin 300452, China

Received:2010-03-28 Published:2011-04-21

摘要/Abstract

摘要：

针对Apriori算法寻找频繁项集时,需要多次扫描事务数据库和可能产生大量候选项集的问题,提出了一种向量和数组相结合的频繁项集挖掘算法。该算法不仅实现了只扫描事务数据库一次,而且避免了模式匹配,减少了无价值的候选项集的产生。通过与已有算法的比较,验证了本文算法具有较高的挖掘效率,而且数据库的项数越多，此算法的挖掘效果越明显。

关键词: 数据挖掘;关联规则;Apriori算法;频繁项集

Abstract:

To solve the problem that a large number of candidate sets will be generated when an apriori algorithm is used to scan the transaction database many times to look for frequent itemsets, a frequent itemsets mining algorithm is presented based on the combination of vector and array, which can scan the transaction database only once, avoid pattern matching and reduce the generation of worthless candidate sets. In addition, by comparison with the existing algorithms, this algorithm is verified with a high efficiency of mining. And the more items in the database the more effective it is.

Key words: data mining; association rules; apriori algorithm; frequent itemsets

张文东1,尹金焕1,贾晓飞2,黄超1,苑衍梅1. 基于向量的频繁项集挖掘算法研究[J]. J4, 2011, 46(3): 31-34.

ZHANG Wen-dong1, YIN Jin-huan1, JIA Xiao-fei2, HUANG Chao1, YUAN Yan-mei1. Research of a frequent itemsets mining algorithm based on vector[J]. J4, 2011, 46(3): 31-34.

参考文献

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed