您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2014, Vol. 49 ›› Issue (08): 40-47.doi: 10.6040/j.issn.1671-9352.1.2014.114

• 论文 • 上一篇    下一篇

一种三支决策软增量聚类算法

张聪, 于洪   

  1. 重庆邮电大学计算智能重庆市重点实验室, 重庆 400065
  • 收稿日期:2009-06-20 修回日期:2014-07-01 发布日期:2014-09-24
  • 通讯作者: 于洪(1972-),女,博士,教授,研究方向为粗糙集、智能信息处理、Web智能、数据挖掘等.E-mail:yuhong@cqupt.edu.cn E-mail:yuhong@cqupt.edu.cn
  • 作者简介:张聪(1988-),男,硕士研究生,研究方向为数据挖掘、聚类分析.E-mail:zhangcong0214@163.com
  • 基金资助:
    国家自然科学基金资助项目(61379114,61272060);重庆市自然科学基金资助项目(cstc2013jcyjA40063)

An incremental three-way decisions soft clustering algorithm

ZHANG Cong, YU Hong   

  1. Chongqing Key Lab of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2009-06-20 Revised:2014-07-01 Published:2014-09-24

摘要: 已有的大多数聚类算法都假设数据集保持不变,然而,很多应用中数据集是会随时间变化的。为此,提出了一种新的三支决策软增量聚类算法。采用区间集的形式表示类簇,区间集的上界、边界与下界就对应着三支决策产生的正域、边界域和负域,并提出了一种基于代表点的初始聚类算法。采用同样的方式对新增数据集进行一次预聚类,以消除数据处理顺序对最终聚类结果产生的影响。为了快速查找新增数据的相似区域,建立了代表点搜索树,并且给出了查找和更新搜索树的策略。运用三支决策策略完成增量聚类。实验结果表明提出的增量聚类算法是有效的。

关键词: 软聚类, 搜索树, 增量聚类, 三支决策

Abstract: Most of the clustering algorithms reported assume a data set always does not change. However, it is often observed that the analyzed data set changes over time in many applications. To combat changes, we introduce a new incremental soft clustering approach based on three-way decisions theory. Firstly, the interval sets are used to represent a cluster, wherein the upper bound, the border, the lower bound of interval sets corresponding to positive region, boundary region, negative region generated by the three-way decisions respectively, and an initial clustering algorithm is proposed by using representative points. Secondly, to eliminate the influence of the processing order on final incremental clustering results, the incremental data is pre-clustered used the same way. To quickly search similar areas for incremental data, a searching tree based on the representative points is constructed, and the strategies of searching and updating are presented. Finally, the three-way decisions strategy is used to incremental clustering. The results of the experiments show the approach is effective to incremental clustering.

Key words: incremental clustering, soft clustering, searching tree, three-way decisions

中图分类号: 

  • TP181
[1] PHAM D T, DIMOV S S, NGUYEN C D. An incremental K-means algorithm[J]. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 2004, 218(7):783-795.
[2] ESTER M, KRIEGEL H P, SANDER J, et al. Incremental clustering for mining in a data warehousing environment[C]//Proceedings of the 24th VLDB Conference. Washington: IEEE Computer Society, 1998: 323-333.
[3] GOYAL N, GOYAL P, Venkatramaiah K, et al. An efficient density based incremental clustering algorithm in data warehousing environment[C]//2009 International Conference on Computer Engineering and Applications. Singapore: IACSIT Press, 2009: 482-486.
[4] 陈宁, 陈安, 周龙骧. 基于密度的增量式网格聚类算[J]. 软件学报, 2002, 13(1):1-7. CHENG Ning, CHENG An, ZHOU Longxiang. An incremental grid density-based clustering algorithm[J]. Journal of Software, 2002, 13(1):1-7.
[5] PATRA B K, VILLE O, LAUNONEN R, et al. Distance based incremental clustering for mining clusters of arbitrary shapes[C]//Pattern Recognition and Machine Intelligence. Berlin Heidelberg: Springer, 2013: 229-236.
[6] IBRAHIM R, AHMED N, YOUSRI N A, et al. Incremental mitosis: discovering clusters of arbitrary shapes and densities in dynamic data[C]//2012 11th International Conference on Machine Learning and Applications. Washington: IEEE Computer Society, 2012: 102-107.
[7] NING Huazhong, XU Wei, CHI Yun, et al. Incremental spectralclustering by efficiently updating the eigen-system[J]. Pattern Recognition, 2010, 43(1):113-127.
[8] GAD W K, KAMEL M S. Incremental clustering algorithm based on phrase-semantic similarity histogram[C]//2010 International Conference on Machine Learning and Cybernetics. Washington: IEEE Computer Society, 2010:2088-2093.
[9] GIL-GARCIA R, PONS-PORRATE A. Dynamic hierarchical algorithms for document clustering[J]. Pattern Recognition Letters, 2010, 31(6):469-477.
[10] PREZ-SUREA A, MARTINEZ-TRINIDAD J F, CARRASCO-OCHOA J A, et al. An algorithm based on density and compactness for dynamic overlapping Clustering[J]. Pattern Recognition, 2013, 46(11):3040-3055.
[11] YAO Yiyu. An outline of a theory of three-way decisions[C]//Rough Sets and Current Trends in Computing. Berlin Heidelberg: Springer, 2012: 1-17.
[12] ZHOU Bing, YAO Yiyu, LUO Jigang. Cost-sensitive three-way email spam filtering[J]. Journal of Intelligent Information Systems, 2013: 1-27.
[13] AZAM N, YAO Jingtao. Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets[J]. International Journal of Approximate Reasoning, 2014, 55(1):142-155.
[14] LI Liudun, TIAN Rui, LIANG Decui. Three-way Decisions in Dynamic Decision-theoretic Rough Sets[C]//RoughSets and Knowledge Technology. Berlin Heidelberg: Springer, 2013: 291-301.
[15] YAO Yiyu, LINGRAS P, WANG Ruizhi, et al. Interval set cluster analysis: a re-formulation[C]//Rough Sets, FuzzySets, Data Mining and Granular Computing. BerlinHeidelberg: Springer, 2009: 398-405.
[16] YU Hong, WANG Ying. Three-way decisions method for overlapping clustering [C]//Rough Sets and Current Trends in Computing. Berlin Heidelberg: Springer, 2012: 277-286.
[17] 王爱平, 万国伟, 程志权,等. 支持在线学习的增量式极端随机森林分类器[J]. 软件学报, 2011, 22(9):2059-2074. WANG Aiping, WAN Guowei, CHENG Zhiquan, et al. Incremental learning extremely random forest classifier for online learning[J]. Journal of Software, 2011, 22(9):2059-2074.
[1] 刘国涛,张燕平,徐晨初. 一种优化覆盖中心的三支决策模型[J]. 山东大学学报(理学版), 2017, 52(3): 105-110.
[2] 田海龙, 朱艳辉, 梁韬, 马进, 刘璟. 基于三支决策的中文微博观点句识别研究[J]. 山东大学学报(理学版), 2014, 49(08): 58-65.
[3] 张里博, 李华雄, 周献中, 黄兵. 人脸识别中的多粒度代价敏感三支决策[J]. 山东大学学报(理学版), 2014, 49(08): 48-57.
[4] 杜丽娜, 徐久成, 刘洋洋, 孙林. 基于三支决策风险最小化的风险投资评估应用研究[J]. 山东大学学报(理学版), 2014, 49(08): 66-72.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!