基于非洲秃鹫优化算法改进的密度峰值聚类

doi:10.6040/j.issn.1671-9352.0.2022.623

摘要/Abstract

摘要：

密度峰值聚类算法是一种自动寻找簇中心的新型快速搜索算法。针对其截断距离的不确定和一步式分配策略不稳健等缺陷, 本文提出一种基于非洲秃鹫优化算法改进的密度峰值聚类算法。通过准确率(accuracy, Acc)这一评价指标建立优化问题的目标函数, 利用非洲秃鹫优化算法强大的寻优能力对不确定的截断距离d_c进行优化, 降低了人为取值的不准确性; 其次, 根据数据集密度均值将其划分为高低不同的密度区域, 对不同区域采用不同的分配策略, 针对高密度区域内的数据点采用与原密度峰值聚类相同的分配方法, 对低密度区域内数据点则根据其k近邻数量进行聚类; 最后, 将该算法在合成和真实数据集上进行实验验证, 算法的聚类性能有了很大的提升, 且对密度差异性较大的数据集划分也更加精确。

关键词: 非洲秃鹫优化算法, 密度峰值, 截断距离, 分配策略

Abstract:

Density peak clustering algorithm is a new fast search algorithm for automatically finding cluster centers. Aiming at the uncertainty of its cut-off distance and the instability of the one-step allocation strategy, an improved density peak clustering approach based on African vultures optimization algorithm is proposed. The objective function of the optimization problem is established through evaluating accuracy(Acc), and the uncertain cut-off distance d_c is optimized by the powerful optimization ability of the African vultures optimization algorithm, which reduces the inaccuracy of artificial values. Secondly, according to the average density of the data set, it is divided into different density areas, and different allocation strategies are used for different areas. For data points in the high-density area, the same allocation method as the original density peak clustering is used, and for data points in the low-density area, the k-nearest neighbor method is used for clustering. Finally, the algorithm is experimentally verified on synthetic and real data sets, the clustering performance of the algorithm has been greatly improved, and the division of data sets with large density differences is also more accurate.

Key words: African vultures optimization algorithm, density peak, cut-off distance, allocation strategy

中图分类号:

TP301

罗兴隆,贺兴时,周洁,杨新社. 基于非洲秃鹫优化算法改进的密度峰值聚类[J]. 《山东大学学报(理学版)》, 2024, 59(1): 46-55,71.

Xinglong LUO,Xingshi HE,Jie ZHOU,Xinshe YANG. Improved density peak clustering approach based on African vultures optimization algorithm[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(1): 46-55,71.

图/表 7

图1

表1

表2

表3

图2

图3

图4

参考文献 30

1	STREHL A , GHOSH J . Relationship-based clustering and visualization for high-dimensional data mining[J]. INFORMS Journal on Computing, 2003, 15 (2): 208- 230. doi: 10.1287/ijoc.15.2.208.14448
2	DONG Gaogao , TIAN Lixin , DU Ruijin , et al. Analysis of percolation behaviors of clustered networks with partial support-dependence relations[J]. Physica A: Statistical Mechanics and its Applications, 2014, 394, 370- 378. doi: 10.1016/j.physa.2013.09.055
3	RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks[J]. Science, 2014, 344 (6191): 1492- 1496. doi: 10.1126/science.1242072
4	CHEN Yewang , LAI Dehe , QI Han , et al. A new method to estimate ages of facial image for large database[J]. Multimedia Tools and Applications, 2016, 75 (5): 2877- 2895. doi: 10.1007/s11042-015-2485-9
5	XIE Juanying , GAO Hongchao , XIE Weixin , et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354, 19- 40. doi: 10.1016/j.ins.2016.03.011
6	MEHMOOD R , ZHANG Guangzhi , BIE Rongfang , et al. Clustering by fast search and find of density peaks via heat diffusion[J]. Neurocomputing, 2016, 208, 210- 217. doi: 10.1016/j.neucom.2016.01.102
7	TANG Guihua, JIA Sen, LI Jun. An enhanced density peak-based clustering approach for hyperspectral band selection[C]//2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). New York: IEEE, 2015: 1116-1119.
8	WANG Shuliang , WANG Dakui , LI Caoyuan , et al. Clustering by fast search and find of density peaks with data field[J]. Chinese Journal of Electronics, 2016, 25 (3): 397- 402. doi: 10.1049/cje.2016.05.001
9	JIANG Dong , ZANG Wenke , SUN Rui , et al. Adaptive density peaks clustering based on K-nearest neighbor and Gini coefficient[J]. IEEE Access, 2020, 8, 113900- 113917. doi: 10.1109/ACCESS.2020.3003057
10	DU Mingjing , DING Shifei , XU Xiao , et al. Density peaks clustering using geodesic distances[J]. International Journal of Machine Learning and Cybernetics, 2018, 9 (8): 1335- 1349. doi: 10.1007/s13042-017-0648-x
11	WANG Xiaofeng , XU Yifan . Fast clustering using adaptive density peak detection[J]. Statistical Methods in Medical Research, 2017, 26 (6): 2800- 2811. doi: 10.1177/0962280215609948
12	DU Mingjing , DING Shifei , JIA Hongjie . Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J]. Knowledge-Based Systems, 2016, 99, 135- 145. doi: 10.1016/j.knosys.2016.02.001
13	丁志成, 葛洪伟, 周竞. 基于KL散度的密度峰值聚类算法[J]. 重庆邮电大学学报(自然科学版), 2019, 31 (3): 367- 374.
	DING Zhicheng , GE Hongwei , ZHOU Jing . Density peaks clustering based on Kullback Leibler divergence[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2019, 31 (3): 367- 374.
14	谢娟英, 高红超, 谢维信. K近邻优化的密度峰值快速搜索聚类算法[J]. 中国科学(信息科学), 2016, 46 (2): 258- 280.
	XIE Juanying , GAO Hongchao , XIE Weixin . K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[J]. Science China(Information Science), 2016, 46 (2): 258- 280.
15	YU Donghua , LIU Guojun , GUO Maozu , et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7, 34301- 34317. doi: 10.1109/ACCESS.2019.2904254
16	SEYEDI S A , LOTFI A , MORADI P , et al. Dynamic graph-based label propagation for density peaks clustering[J]. Expert Systems with Applications, 2019, 115, 314- 328. doi: 10.1016/j.eswa.2018.07.075
17	LIU Rui , WANG Hong , YU Xiaomei . Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450, 200- 226. doi: 10.1016/j.ins.2018.03.031
18	LÜ Yi , LIU Mandan , XIANG Yue . Fast searching density peak clustering algorithm based on shared nearest neighbor and adaptive clustering center[J]. Symmetry, 2020, 12 (12): 2014. doi: 10.3390/sym12122014
19	LIU Yaohui , MA Zhengming , YU Fang . Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J]. Knowledge-based Systems, 2017, 133, 208- 220. doi: 10.1016/j.knosys.2017.07.010
20	HOU Jian , ZHANG Aihua , QI Naiming . Density peak clustering based on relative density relationship[J]. Pattern Recognition, 2020, 108, 107554. doi: 10.1016/j.patcog.2020.107554
21	WANG Yizhang , WANG Di , ZHANG Xiaofeng , et al. McDPC: multi-center density peak clustering[J]. Neural Computing and Applications, 2020, 32 (17): 13465- 13478. doi: 10.1007/s00521-020-04754-5
22	ABDOLLAHZADEH B , GHAREHCHOPOGH F S , MIRJALILI S . African vultures optimization algorithm: a new nature-inspired metaheuristic algorithm for global optimization problems[J]. Computers & Industrial Engineering, 2021, 158, 107408.
23	YANG Xinshe . Nature-inspired metaheuristic algorithms[M]. Beckington: Luniver Press, 2010.
24	DUA D, KARRA TANISKIDOU E. UCI machine learning repository[EB/OL]. (2023-11-13)[2023-11-13]. http://archive.ics.uci.edu/ml.
25	CHANG Hong , YEUNG D Y . Robust path-based spectral clustering[J]. Pattern Recognition, 2008, 41 (1): 191- 203. doi: 10.1016/j.patcog.2007.04.010
26	GIONIS A , MANNILA H , TSAPARAS P . Clustering aggregation[J]. ACM Transactions on Knowledge Discovery From Data(TKDD), 2007, 1 (1): 4. doi: 10.1145/1217299.1217303
27	SUO Mingliang , ZHU Baolong , ZHOU Ding , et al. Neighborhood grid clustering and its application in fault diagnosis of satellite power system[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2019, 233 (4): 1270- 1283. doi: 10.1177/0954410017751991
28	JAIN A K, LAW M H C. Data clustering: a user's dilemma[C]//International Conference on Pattern Recognition and Machine Intelligence. Berlin: Springer, 2005: 1-10.
29	VEENMAN C J , REINDERS M J T , BACKER E . A maximum variance cluster algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24 (9): 1273- 1280. doi: 10.1109/TPAMI.2002.1033218
30	FU L , MEDICO E . FLAME: a novel fuzzy clustering method for the analysis of DNA microarray data[J]. BMC Bioinformatics, 2007, 8 (1): 1- 15. doi: 10.1186/1471-2105-8-1

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

类型	数据集名称	样本量	维度	簇
合成数据集	Flame	240	2	2
	Aggregation	788	2	7
	Spiral	312	2	3
	D31	3 100	2	31
	Jain	373	2	2
	R15	600	2	15
	Three-circle	299	2	3
	Compound	399	2	6
	A3	266	2	3
真实数据集	Ionosphere	351	34	2
	Libras	360	90	15
	Seeds	210	7	3
	Iris	150	4	3
	Ecoli	336	8	8
	Wine	178	13	3
	Pima	768	8	2
	WDBC	569	30	2
	Waveform (noise)	5 000	40	3

算法	Flame			Aggregation			Spiral
算法	Ami	Ari	Fmi	Ami	Ari	Fmi	Ami	Ari	Fmi
K-means	0.405 9	0.453 5	0.736 1	0.774 8	0.673 5	0.742 9	-0.005 7	-0.006 3	0.328 1
DPC	1.000 0	1.000 0	1.000 0	0.992 1	0.995 6	0.996 5	1.000 0	1.000 0	1.000 0
DBSCAN	0.897 5	0.938 8	0.971 2	0.911 4	0.910 2	0.932 7	1.000 0	1.000 0	1.000 0
DPCSA	1.000 0	1.000 0	1.000 0	0.953 7	0.958 1	0.967 3	1.000 0	1.000 0	1.000 0
DPC-AVOA	1.000 0	1.000 0	1.000 0	0.991 1	0.9942	0.9954	1.000 0	1.000 0	1.000 0

算法	D31			Jain			R15
算法	Ami	Ari	Fmi	Ami	Ari	Fmi	Ami	Ari	Fmi
K-means	0.941 6	0.898 9	0.902 3	0.404 4	0.413 4	0.742 5	0.981 0	0.973 2	0.975 0
DPC	0.954 6	0.934 5	0.936 6	0.503 7	0.569 1	0.815 9	0.993 7	0.992 7	0.993 2
DBSCAN	0.889 5	0.807 8	0.818 6	0.885 1	0.975 8	0.990 6	0.982 2	0.981 9	0.983 1
DPCSA	0.955 2	0.935 3	0.937 4	0.216 7	0.044 2	0.592 4	0.988 5	0.985 7	0.986 6
DPC-AVOA	0.957 9	0.9407	0.942 6	1.000 0	1.000 0	1.000 0	0.990 7	0.989 1	0.989 8

算法	Three-circle			Compound			A3
算法	Ami	Ari	Fmi	Ami	Ari	Fmi	Ami	Ari	Fmi
K-means	0.2232	0.131 9	0.437 7	0.657 3	0.528 9	0.634 9	0.5792	0.481 0	0.661 1
DPC	1.000 0	1.000 0	1.000 0	0.775 4	0.591 0	0.687 6	0.582 8	0.487 5	0.665 0
DBSCAN	1.000 0	1.000 0	1.000 0	0.7340	0.802 9	0.862 7	0.858 7	0.866 1	0.914 4
DPCSA	0.604 3	0.515 1	0.729 6	0.8392	0.828 4	0.870 7	1.000 0	1.000 0	1.000 0
DPC-AVOA	1.000 0	1.000 0	1.000 0	0.803 1	0.830 5	0.879 6	1.000 0	1.000 0	1.000 0

算法	Ionosphere			Libras			Seeds
算法	Ami	Ari	Fmi	Ami	Ari	Fmi	Ami	Ari	Fmi
K-means	0.068 2	0.100 7	0.568 8	0.509 4	0.291 5	0.342 4	0.667 3	0.695 4	0.796 4
DPC	0.089 7	0.143 3	0.595 3	0.38 7	0.207 9	0.301 4	0.714 4	0.744 7	0.829 6
DBSCAN	0.5520	0.683 5	0.857 5	0.128 3	0.028 4	0.242 7	0.401 6	0.402 5	0.635 4
DPCSA	0.135 5	0.218 3	0.643 2	0.493 9	0.268 3	0.357 2	0.660 9	0.687 3	0.791 8
DPC-AVOA	0.372 1	0.503 4	0.782 9	0.584 2	0.381 3	0.439 5	0.807 3	0.850 6	0.900 0

算法	Iris			Ecoli			Wine
算法	Ami	Ari	Fmi	Ami	Ari	Fmi	Ami	Ari	Fmi
K-means	0.757 3	0.743 6	0.829 4	0.486 8	0.374 9	0.516 7	0.856 4	0.880 3	0.920 5
DPC	0.766 8	0.719 5	0.815 8	0.517 8	0.436 5	0.569 2	0.706 4	0.672 3	0.783 4
DBSCAN	0.676 8	0.6120	0.729 1	0.451 6	0.525 5	0.662 3	0.434 7	0.418 7	0.623 4
DPCSA	0.883 1	0.903 8	0.935 5	0.4170	0.478 3	0.674 2	0.74 8	0.741 4	0.828 3
DPC-AVOA	0.897 1	0.903 9	0.935 6	0.4730	0.547 4	0.713 3	0.843 1	0.866 6	0.911 3

算法	WDBC			Pima			Waveform (noise)
算法	Ami	Ari	Fmi	Ami	Ari	Fmi	Ami	Ari	Fmi
K-means	0.637 5	0.754 8	0.887 6	0.046 3	0.087 1	0.576 6	0.361 1	0.251 5	0.502 3
DPC	0.008 6	-0.010 7	0.715 5	0.034 1	0.0780	0.588 8	0.087 8	0.067 1	0.458 5
DBSCAN	0.333 4	0.453 9	0.747 4	0.011 3	0.041 5	0.693 6	0.000 0	0.000 0	0.577 3
DPCSA	0.336 1	0.377 1	0.759 5	0.001 7	0.014 3	0.711 9	0.152 4	0.134 9	0.462 3
DPC-AVOA	0.448 7	0.523 1	0.801 9	0.047 9	0.093 9	0.709 4	0.274 7	0.255 5	0.507 2