《山东大学学报(理学版)》 ›› 2025, Vol. 60 ›› Issue (1): 29-44.doi: 10.6040/j.issn.1671-9352.7.2023.4130
• • 上一篇
张春昊1,解滨1,2,3*,徐童童1,张喜梅1
ZHANG Chunhao1, XIE Bin1,2,3*, XU Tongtong1, ZHANG Ximei1
摘要: 结合自然邻居搜索算法改进了密度峰值聚类(clustering by fast search and find of density peaks, CFSFDP)算法存在的一系列问题,提出基于自然邻居搜索优化策略的密度峰值聚类(density peak clustering algorithm optimized by natural neighbor search, NaN-CFSFDP)算法。基于自然邻居搜索算法提出了一种离群样本的检测方法,针对CFSFDP算法中截断距离dc人工准确取值较难的问题,结合自然邻居搜索算法改进了dc的计算方式,实现了dc的自动取值。重新设计并统一了CFSFDP算法的样本密度度量规则,使其更关注每个样本的局部信息。由于数据集中因类簇间的密度差异大,密度峰值点集中于稠密簇使得簇丢失,因此提出样本共享自然邻居和类簇共享自然邻居的概念,构造新的类簇融合算法。合成数据集和真实数据集上的实验结果表明,在大多数情况下,NaN-CFSFDP算法在聚类性能上优于或至少与比较方法相当,且与CFSFDP算法及其改进算法相比参数更少。
中图分类号:
[1] OKTAR Y, TURKAN M. A review of sparsity-based clustering methods[J]. Signal Processing, 2018, 148:20-30. [2] SAXENA A, PRASAD M, GUPTA A, et al. A review of clustering techniques and developments[J]. Neurocomputing, 2017, 267:664-681. [3] MACQUEEN J B. Some methods for classification and analysis of multivariate observations[C] //Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967:281-297. [4] BEZDEK J C, EHRLICH R, FULL W. FCM: the fuzzy c-means clustering algorithm[J]. Computers and Geosciences, 1984, 10(2/3):191-203. [5] WANG Wei, YANG Jiong, MUNTZ R. STING: a statistical information grid approach to spatial data mining[C] //Proceedings of the 23rd International Conference on Very Large Data Bases. Athens: ACM, 1997:186-195. [6] GUHA S, RASTOGI R, SHIM K. Cure: an efficient clustering algorithm for large databases[J]. Information Systems, 2001, 26(1):35-58. [7] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C] //Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Portland Oregon, AAAI, 1996:226-231. [8] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496. [9] 谢娟英,高红超,谢维信. K近邻优化的密度峰值快速搜索聚类算法[J]. 中国科学(信息科学), 2016, 46(2):258-280. XIE Juanying, GAO Hongchao, XIE Weixin. K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[J]. Scientia Sinica(Informationis), 2016, 46(2):258-280. [10] XIE Juanying, GAO Hongchao, XIE Weixin, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354:19-40. [11] JIANG Jianhua, CHEN Yujun, MENG Xianqiu, et al. A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process[J]. Physica A, 2019, 523:702-713. [12] ZHANG Rui, DU Tao, QU Shouning, et al. Adaptive density-based clustering algorithm with shared KNN conflict game[J]. Information Sciences, 2021, 565:344-369. [13] LIU Yaohui, MA Zhengming, YU Fang. Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J]. Knowledge-based Systems, 2017, 133:208-220. [14] BAI Liang, CHENG Xueqi, LIANG Jiye, et al. Fast density clustering strategies based on the k-means algorithm[J]. Pattern Recognition, 2017, 71:375-386. [15] LIU Rui, WANG Hong, YU Xiaomei. Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450:200-226. [16] 张新元,贠卫国. 共享K近邻和多分配策略的密度峰值聚类算法[J]. 小型微型计算机系统, 2023, 44(1):75-82. ZHANG Xinyuan, YUN Weiguo. Sharing K-nearest neighbors and multiple assignment policies density peaks clustering algorithm[J]. Journal of Chinese Computer Systems, 2023, 44(1):75-82. [17] ZHANG Chunhao, XIE Bin, ZHANG Yiran. Reverse-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Chinese Journal of Electronics, 2023, 32(6):1341-1354. [18] 徐童童,解滨,张喜梅,等. 自适应聚类中心策略优化的密度峰值聚类算法[J]. 计算机工程与应用,2023,59(21):91-101. XU Tongtong, XIE Bin, ZHANG Ximei, et al. Density peak clustering algorithm optimized by adaptive clustering centers strategy[J]. Computer Engineering and Applications, 2023, 59(21):91-101. [19] 张春昊,解滨,张喜梅,等. 一种结合自适应近邻与密度峰值的加权模糊聚类算法[J]. 小型微型计算机系统,2023, 44(9):1974-1982. ZHANG Chunhao, XIE Bin, ZHANG Ximei, et al. Weighted fuzzy clustering algorithm combining adaptive nearest neighbors and density peaks[J]. Journal of Chinese Computer Systems, 2023, 44(9):1974-1982. [20] CHEN Yewang, TANG Shengyu, PEI Songwen. DHeat: a density heat-based algorithm for clustering with effective radius[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 48(4):649-660. [21] BRYANT A, CIOS K. RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 30(6):1109-1121. [22] ZHU Qingsheng, FENG Ji, HUANG Jinlong. Natural neighbor: a self-adaptive neighborhood method without parameter K[J]. Pattern Recognition Letters, 2016, 80:30-36. [23] HUANG Jinlong, ZHU Qingsheng, YANG Lijun, et al. A non-parameter outlier detection algorithm based on natural neighbor[J]. Knowledge-based Systems, 2016, 92:71-77. [24] YANG Lijun, ZHU Qingsheng, HUANG Jinlong, et al. Adaptive edited natural neighbor algorithm[J]. Neurocomputing, 2017, 230:427-433. [25] CHENG Dongdong, ZHU Qingsheng, HUANG Jinlong, et al. Natural neighbor-based clustering algorithm with local representatives[J]. Knowledge-based Systems, 2017, 123:238-253. [26] 陈叶旺,申莲莲,钟才明,等. 密度峰值聚类算法综述[J]. 计算机研究与发展, 2020, 57(2):378-394. CHEN Yewang, SHEN Lianlian, ZHONG Caiming, et al. Survey on density peak clustering algorithm[J]. Journal of Computer Research and Development, 2020, 57(2):378-394. [27] FU Limin, MEDICO E. Flame: a novel fuzzy clustering method for the analysis of DNA microarray data[J]. BMC Bioinformatics, 2007, 8(3):1-15. [28] GIONIS A, MANNILA H, TSAPARAS P. Clustering aggregation[J]. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1):1-30. [29] JAIN A K, LAW M H C. Data clustering: a users dilemma[C] //International Conference on Pattern Recognition & Machine Intelligence. Kolkata: Springer, 2005:1-10. [30] DING Shifei, DU Mingjing, SUN Tongfeng, et al. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood[J]. Knowledge-based Systems, 2017, 133:294-313. [31] HUBERT L, ARABIE P. Comparing partitions[J]. Journal of Classification, 1985, 2:193-218. [32] CHANG H, YEUNG D Y. Robust path-based spectral clustering[J]. Pattern Recognition, 2008, 41(1):191-203. [33] VEENMAN C J, REINDERS M J T, BACKER E. A maximum variance cluster algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(9):1273-1280. [34] ZAHN C T. Graph-theoretical methods for detecting and describing gestalt clusters[J]. IEEE Transactions on Computers, 1971, C-20(1):68-86. [35] REZAE M, FRANTI P. Set matching measures for external cluster validity[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):2173-2186. |
[1] | 罗奇,苟刚. 基于聚类和群组归一化的多模态对话情绪识别[J]. 《山东大学学报(理学版)》, 2024, 59(7): 105-112. |
[2] | 朱金,付玉,管文瑞,王平心. 基于自然最近邻的样本扰动三支聚类[J]. 《山东大学学报(理学版)》, 2024, 59(5): 45-51. |
[3] | 郑晨颖,陈颖悦,侯贤宇,江连吉,廖亮. 一种邻域粒的模糊C均值聚类算法[J]. 《山东大学学报(理学版)》, 2024, 59(5): 35-44. |
[4] | 孙嘉睿,杜明晶. 模糊边界剥离聚类[J]. 《山东大学学报(理学版)》, 2024, 59(3): 27-36, 50. |
[5] | 罗兴隆,贺兴时,周洁,杨新社. 基于非洲秃鹫优化算法改进的密度峰值聚类[J]. 《山东大学学报(理学版)》, 2024, 59(1): 46-55,71. |
[6] | 金鑫,于非凡,戴雨桐,李兹谦,邹永魁. 基于聚类分析和鉴别信息的教学效果评价模型分析[J]. 《山东大学学报(理学版)》, 2023, 58(7): 115-120. |
[7] | 徐华畅,许倩,赵钰琳,梁峰宁,徐凯,朱红. 基于改进EfficientNetV2的脑胶质瘤IDH1突变状态预测方法[J]. 《山东大学学报(理学版)》, 2023, 58(7): 60-66. |
[8] | 马慧,魏立力. 基于犹豫三角模糊相关系数的聚类分析[J]. 《山东大学学报(理学版)》, 2023, 58(12): 118-126. |
[9] | 凡嘉琛,王平心,杨习贝. 基于三支决策的密度敏感谱聚类[J]. 《山东大学学报(理学版)》, 2023, 58(1): 59-66. |
[10] | 李心雨,范辉,刘惊雷. 基于自适应图调节和低秩矩阵分解的鲁棒聚类[J]. 《山东大学学报(理学版)》, 2022, 57(8): 21-38. |
[11] | 柳利芳,马园园. 基于多视角对称非负矩阵分解的跨模态信息检索方法[J]. 《山东大学学报(理学版)》, 2022, 57(7): 65-72. |
[12] | 孙林,梁娜,徐久成. 基于自适应邻域互信息与谱聚类的特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(12): 13-24. |
[13] | 武祺然,周力凯,孙金金,王念鸽,余群芳. 浙江省空气质量变化特征研究——基于函数型数据分析[J]. 《山东大学学报(理学版)》, 2021, 56(7): 53-64. |
[14] | 杨婷,朱恒东,马盈仓,汪义瑞,杨小飞. 基于L2,1范数和流形正则项的半监督谱聚类算法[J]. 《山东大学学报(理学版)》, 2021, 56(3): 67-76. |
[15] | 张一鸣,王国胤,胡军,傅顺. 基于密度峰值和网络嵌入的重叠社区发现[J]. 《山东大学学报(理学版)》, 2021, 56(1): 91-102. |
|