山东大学学报(理学版) ›› 2014, Vol. 49 ›› Issue (08): 40-47.doi: 10.6040/j.issn.1671-9352.1.2014.114
张聪, 于洪
ZHANG Cong, YU Hong
摘要: 已有的大多数聚类算法都假设数据集保持不变,然而,很多应用中数据集是会随时间变化的。为此,提出了一种新的三支决策软增量聚类算法。采用区间集的形式表示类簇,区间集的上界、边界与下界就对应着三支决策产生的正域、边界域和负域,并提出了一种基于代表点的初始聚类算法。采用同样的方式对新增数据集进行一次预聚类,以消除数据处理顺序对最终聚类结果产生的影响。为了快速查找新增数据的相似区域,建立了代表点搜索树,并且给出了查找和更新搜索树的策略。运用三支决策策略完成增量聚类。实验结果表明提出的增量聚类算法是有效的。
中图分类号:
[1] PHAM D T, DIMOV S S, NGUYEN C D. An incremental K-means algorithm[J]. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 2004, 218(7):783-795. [2] ESTER M, KRIEGEL H P, SANDER J, et al. Incremental clustering for mining in a data warehousing environment[C]//Proceedings of the 24th VLDB Conference. Washington: IEEE Computer Society, 1998: 323-333. [3] GOYAL N, GOYAL P, Venkatramaiah K, et al. An efficient density based incremental clustering algorithm in data warehousing environment[C]//2009 International Conference on Computer Engineering and Applications. Singapore: IACSIT Press, 2009: 482-486. [4] 陈宁, 陈安, 周龙骧. 基于密度的增量式网格聚类算[J]. 软件学报, 2002, 13(1):1-7. CHENG Ning, CHENG An, ZHOU Longxiang. An incremental grid density-based clustering algorithm[J]. Journal of Software, 2002, 13(1):1-7. [5] PATRA B K, VILLE O, LAUNONEN R, et al. Distance based incremental clustering for mining clusters of arbitrary shapes[C]//Pattern Recognition and Machine Intelligence. Berlin Heidelberg: Springer, 2013: 229-236. [6] IBRAHIM R, AHMED N, YOUSRI N A, et al. Incremental mitosis: discovering clusters of arbitrary shapes and densities in dynamic data[C]//2012 11th International Conference on Machine Learning and Applications. Washington: IEEE Computer Society, 2012: 102-107. [7] NING Huazhong, XU Wei, CHI Yun, et al. Incremental spectralclustering by efficiently updating the eigen-system[J]. Pattern Recognition, 2010, 43(1):113-127. [8] GAD W K, KAMEL M S. Incremental clustering algorithm based on phrase-semantic similarity histogram[C]//2010 International Conference on Machine Learning and Cybernetics. Washington: IEEE Computer Society, 2010:2088-2093. [9] GIL-GARCIA R, PONS-PORRATE A. Dynamic hierarchical algorithms for document clustering[J]. Pattern Recognition Letters, 2010, 31(6):469-477. [10] PREZ-SUREA A, MARTINEZ-TRINIDAD J F, CARRASCO-OCHOA J A, et al. An algorithm based on density and compactness for dynamic overlapping Clustering[J]. Pattern Recognition, 2013, 46(11):3040-3055. [11] YAO Yiyu. An outline of a theory of three-way decisions[C]//Rough Sets and Current Trends in Computing. Berlin Heidelberg: Springer, 2012: 1-17. [12] ZHOU Bing, YAO Yiyu, LUO Jigang. Cost-sensitive three-way email spam filtering[J]. Journal of Intelligent Information Systems, 2013: 1-27. [13] AZAM N, YAO Jingtao. Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets[J]. International Journal of Approximate Reasoning, 2014, 55(1):142-155. [14] LI Liudun, TIAN Rui, LIANG Decui. Three-way Decisions in Dynamic Decision-theoretic Rough Sets[C]//RoughSets and Knowledge Technology. Berlin Heidelberg: Springer, 2013: 291-301. [15] YAO Yiyu, LINGRAS P, WANG Ruizhi, et al. Interval set cluster analysis: a re-formulation[C]//Rough Sets, FuzzySets, Data Mining and Granular Computing. BerlinHeidelberg: Springer, 2009: 398-405. [16] YU Hong, WANG Ying. Three-way decisions method for overlapping clustering [C]//Rough Sets and Current Trends in Computing. Berlin Heidelberg: Springer, 2012: 277-286. [17] 王爱平, 万国伟, 程志权,等. 支持在线学习的增量式极端随机森林分类器[J]. 软件学报, 2011, 22(9):2059-2074. WANG Aiping, WAN Guowei, CHENG Zhiquan, et al. Incremental learning extremely random forest classifier for online learning[J]. Journal of Software, 2011, 22(9):2059-2074. |
[1] | 刘国涛,张燕平,徐晨初. 一种优化覆盖中心的三支决策模型[J]. 山东大学学报(理学版), 2017, 52(3): 105-110. |
[2] | 田海龙, 朱艳辉, 梁韬, 马进, 刘璟. 基于三支决策的中文微博观点句识别研究[J]. 山东大学学报(理学版), 2014, 49(08): 58-65. |
[3] | 张里博, 李华雄, 周献中, 黄兵. 人脸识别中的多粒度代价敏感三支决策[J]. 山东大学学报(理学版), 2014, 49(08): 48-57. |
[4] | 杜丽娜, 徐久成, 刘洋洋, 孙林. 基于三支决策风险最小化的风险投资评估应用研究[J]. 山东大学学报(理学版), 2014, 49(08): 66-72. |
|