您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (1): 46-55,71.doi: 10.6040/j.issn.1671-9352.0.2022.623

•   • 上一篇    下一篇

基于非洲秃鹫优化算法改进的密度峰值聚类

罗兴隆1(),贺兴时1,*(),周洁1,杨新社2   

  1. 1. 西安工程大学理学院, 陕西 西安 710600
    2. 密德萨斯大学科学与技术学院, 英国 剑桥 CB2 1TN
  • 收稿日期:2022-11-17 出版日期:2024-01-20 发布日期:2024-01-19
  • 通讯作者: 贺兴时 E-mail:lxl_bert@163.com;xsh1002@126.com
  • 作者简介:罗兴隆(1997—),男,硕士研究生,研究方向为聚类分析、智能计算算法. E-mail: lxl_bert@163.com
  • 基金资助:
    国家自然科学基金资助项目(12101477);陕西省教育厅自然科学基金资助项目(21JK0654)

Improved density peak clustering approach based on African vultures optimization algorithm

Xinglong LUO1(),Xingshi HE1,*(),Jie ZHOU1,Xinshe YANG2   

  1. 1. School of Science, Xi'an Polytechnic University, Xi'an 710600, Shannxi, China
    2. School of Science and Technology, Middlesex University, Cambridge CB2 1TN, UK
  • Received:2022-11-17 Online:2024-01-20 Published:2024-01-19
  • Contact: Xingshi HE E-mail:lxl_bert@163.com;xsh1002@126.com

摘要:

密度峰值聚类算法是一种自动寻找簇中心的新型快速搜索算法。针对其截断距离的不确定和一步式分配策略不稳健等缺陷, 本文提出一种基于非洲秃鹫优化算法改进的密度峰值聚类算法。通过准确率(accuracy, Acc)这一评价指标建立优化问题的目标函数, 利用非洲秃鹫优化算法强大的寻优能力对不确定的截断距离dc进行优化, 降低了人为取值的不准确性; 其次, 根据数据集密度均值将其划分为高低不同的密度区域, 对不同区域采用不同的分配策略, 针对高密度区域内的数据点采用与原密度峰值聚类相同的分配方法, 对低密度区域内数据点则根据其k近邻数量进行聚类; 最后, 将该算法在合成和真实数据集上进行实验验证, 算法的聚类性能有了很大的提升, 且对密度差异性较大的数据集划分也更加精确。

关键词: 非洲秃鹫优化算法, 密度峰值, 截断距离, 分配策略

Abstract:

Density peak clustering algorithm is a new fast search algorithm for automatically finding cluster centers. Aiming at the uncertainty of its cut-off distance and the instability of the one-step allocation strategy, an improved density peak clustering approach based on African vultures optimization algorithm is proposed. The objective function of the optimization problem is established through evaluating accuracy(Acc), and the uncertain cut-off distance dc is optimized by the powerful optimization ability of the African vultures optimization algorithm, which reduces the inaccuracy of artificial values. Secondly, according to the average density of the data set, it is divided into different density areas, and different allocation strategies are used for different areas. For data points in the high-density area, the same allocation method as the original density peak clustering is used, and for data points in the low-density area, the k-nearest neighbor method is used for clustering. Finally, the algorithm is experimentally verified on synthetic and real data sets, the clustering performance of the algorithm has been greatly improved, and the division of data sets with large density differences is also more accurate.

Key words: African vultures optimization algorithm, density peak, cut-off distance, allocation strategy

中图分类号: 

  • TP301

图1

不同dc在DPC上的聚类效果"

表1

所有数据集的信息"

类型 数据集名称 样本量 维度
合成数据集 Flame 240 2 2
Aggregation 788 2 7
Spiral 312 2 3
D31 3 100 2 31
Jain 373 2 2
R15 600 2 15
Three-circle 299 2 3
Compound 399 2 6
A3 266 2 3
真实数据集 Ionosphere 351 34 2
Libras 360 90 15
Seeds 210 7 3
Iris 150 4 3
Ecoli 336 8 8
Wine 178 13 3
Pima 768 8 2
WDBC 569 30 2
Waveform (noise) 5 000 40 3

表2

在合成数据集上的Ami、Ari、Fmi指标"

算法 Flame Aggregation Spiral
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.405 9 0.453 5 0.736 1 0.774 8 0.673 5 0.742 9 -0.005 7 -0.006 3 0.328 1
DPC 1.000 0 1.000 0 1.000 0 0.992 1 0.995 6 0.996 5 1.000 0 1.000 0 1.000 0
DBSCAN 0.897 5 0.938 8 0.971 2 0.911 4 0.910 2 0.932 7 1.000 0 1.000 0 1.000 0
DPCSA 1.000 0 1.000 0 1.000 0 0.953 7 0.958 1 0.967 3 1.000 0 1.000 0 1.000 0
DPC-AVOA 1.000 0 1.000 0 1.000 0 0.991 1 0.9942 0.9954 1.000 0 1.000 0 1.000 0
算法 D31 Jain R15
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.941 6 0.898 9 0.902 3 0.404 4 0.413 4 0.742 5 0.981 0 0.973 2 0.975 0
DPC 0.954 6 0.934 5 0.936 6 0.503 7 0.569 1 0.815 9 0.993 7 0.992 7 0.993 2
DBSCAN 0.889 5 0.807 8 0.818 6 0.885 1 0.975 8 0.990 6 0.982 2 0.981 9 0.983 1
DPCSA 0.955 2 0.935 3 0.937 4 0.216 7 0.044 2 0.592 4 0.988 5 0.985 7 0.986 6
DPC-AVOA 0.957 9 0.9407 0.942 6 1.000 0 1.000 0 1.000 0 0.990 7 0.989 1 0.989 8
算法 Three-circle Compound A3
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.2232 0.131 9 0.437 7 0.657 3 0.528 9 0.634 9 0.5792 0.481 0 0.661 1
DPC 1.000 0 1.000 0 1.000 0 0.775 4 0.591 0 0.687 6 0.582 8 0.487 5 0.665 0
DBSCAN 1.000 0 1.000 0 1.000 0 0.7340 0.802 9 0.862 7 0.858 7 0.866 1 0.914 4
DPCSA 0.604 3 0.515 1 0.729 6 0.8392 0.828 4 0.870 7 1.000 0 1.000 0 1.000 0
DPC-AVOA 1.000 0 1.000 0 1.000 0 0.803 1 0.830 5 0.879 6 1.000 0 1.000 0 1.000 0

表3

在真实数据集上的Ami、Ari、Fmi指标"

算法 Ionosphere Libras Seeds
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.068 2 0.100 7 0.568 8 0.509 4 0.291 5 0.342 4 0.667 3 0.695 4 0.796 4
DPC 0.089 7 0.143 3 0.595 3 0.38 7 0.207 9 0.301 4 0.714 4 0.744 7 0.829 6
DBSCAN 0.5520 0.683 5 0.857 5 0.128 3 0.028 4 0.242 7 0.401 6 0.402 5 0.635 4
DPCSA 0.135 5 0.218 3 0.643 2 0.493 9 0.268 3 0.357 2 0.660 9 0.687 3 0.791 8
DPC-AVOA 0.372 1 0.503 4 0.782 9 0.584 2 0.381 3 0.439 5 0.807 3 0.850 6 0.900 0
算法 Iris Ecoli Wine
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.757 3 0.743 6 0.829 4 0.486 8 0.374 9 0.516 7 0.856 4 0.880 3 0.920 5
DPC 0.766 8 0.719 5 0.815 8 0.517 8 0.436 5 0.569 2 0.706 4 0.672 3 0.783 4
DBSCAN 0.676 8 0.6120 0.729 1 0.451 6 0.525 5 0.662 3 0.434 7 0.418 7 0.623 4
DPCSA 0.883 1 0.903 8 0.935 5 0.4170 0.478 3 0.674 2 0.74 8 0.741 4 0.828 3
DPC-AVOA 0.897 1 0.903 9 0.935 6 0.4730 0.547 4 0.713 3 0.843 1 0.866 6 0.911 3
算法 WDBC Pima Waveform (noise)
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.637 5 0.754 8 0.887 6 0.046 3 0.087 1 0.576 6 0.361 1 0.251 5 0.502 3
DPC 0.008 6 -0.010 7 0.715 5 0.034 1 0.0780 0.588 8 0.087 8 0.067 1 0.458 5
DBSCAN 0.333 4 0.453 9 0.747 4 0.011 3 0.041 5 0.693 6 0.000 0 0.000 0 0.577 3
DPCSA 0.336 1 0.377 1 0.759 5 0.001 7 0.014 3 0.711 9 0.152 4 0.134 9 0.462 3
DPC-AVOA 0.448 7 0.523 1 0.801 9 0.047 9 0.093 9 0.709 4 0.274 7 0.255 5 0.507 2

图2

DPC-AVOA、DBSCAN、DPC、K-means在Jain数据集上的聚类图"

图3

DPC-AVOA、DBSCAN、DPC、K-means在three-circle数据集上的聚类图"

图4

DPC-AVOA、DBSCAN、DPC、K-means在A3数据集上的聚类图"

1 STREHL A , GHOSH J . Relationship-based clustering and visualization for high-dimensional data mining[J]. INFORMS Journal on Computing, 2003, 15 (2): 208- 230.
doi: 10.1287/ijoc.15.2.208.14448
2 DONG Gaogao , TIAN Lixin , DU Ruijin , et al. Analysis of percolation behaviors of clustered networks with partial support-dependence relations[J]. Physica A: Statistical Mechanics and its Applications, 2014, 394, 370- 378.
doi: 10.1016/j.physa.2013.09.055
3 RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks[J]. Science, 2014, 344 (6191): 1492- 1496.
doi: 10.1126/science.1242072
4 CHEN Yewang , LAI Dehe , QI Han , et al. A new method to estimate ages of facial image for large database[J]. Multimedia Tools and Applications, 2016, 75 (5): 2877- 2895.
doi: 10.1007/s11042-015-2485-9
5 XIE Juanying , GAO Hongchao , XIE Weixin , et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354, 19- 40.
doi: 10.1016/j.ins.2016.03.011
6 MEHMOOD R , ZHANG Guangzhi , BIE Rongfang , et al. Clustering by fast search and find of density peaks via heat diffusion[J]. Neurocomputing, 2016, 208, 210- 217.
doi: 10.1016/j.neucom.2016.01.102
7 TANG Guihua, JIA Sen, LI Jun. An enhanced density peak-based clustering approach for hyperspectral band selection[C]//2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). New York: IEEE, 2015: 1116-1119.
8 WANG Shuliang , WANG Dakui , LI Caoyuan , et al. Clustering by fast search and find of density peaks with data field[J]. Chinese Journal of Electronics, 2016, 25 (3): 397- 402.
doi: 10.1049/cje.2016.05.001
9 JIANG Dong , ZANG Wenke , SUN Rui , et al. Adaptive density peaks clustering based on K-nearest neighbor and Gini coefficient[J]. IEEE Access, 2020, 8, 113900- 113917.
doi: 10.1109/ACCESS.2020.3003057
10 DU Mingjing , DING Shifei , XU Xiao , et al. Density peaks clustering using geodesic distances[J]. International Journal of Machine Learning and Cybernetics, 2018, 9 (8): 1335- 1349.
doi: 10.1007/s13042-017-0648-x
11 WANG Xiaofeng , XU Yifan . Fast clustering using adaptive density peak detection[J]. Statistical Methods in Medical Research, 2017, 26 (6): 2800- 2811.
doi: 10.1177/0962280215609948
12 DU Mingjing , DING Shifei , JIA Hongjie . Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J]. Knowledge-Based Systems, 2016, 99, 135- 145.
doi: 10.1016/j.knosys.2016.02.001
13 丁志成, 葛洪伟, 周竞. 基于KL散度的密度峰值聚类算法[J]. 重庆邮电大学学报(自然科学版), 2019, 31 (3): 367- 374.
DING Zhicheng , GE Hongwei , ZHOU Jing . Density peaks clustering based on Kullback Leibler divergence[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2019, 31 (3): 367- 374.
14 谢娟英, 高红超, 谢维信. K近邻优化的密度峰值快速搜索聚类算法[J]. 中国科学(信息科学), 2016, 46 (2): 258- 280.
XIE Juanying , GAO Hongchao , XIE Weixin . K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[J]. Science China(Information Science), 2016, 46 (2): 258- 280.
15 YU Donghua , LIU Guojun , GUO Maozu , et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7, 34301- 34317.
doi: 10.1109/ACCESS.2019.2904254
16 SEYEDI S A , LOTFI A , MORADI P , et al. Dynamic graph-based label propagation for density peaks clustering[J]. Expert Systems with Applications, 2019, 115, 314- 328.
doi: 10.1016/j.eswa.2018.07.075
17 LIU Rui , WANG Hong , YU Xiaomei . Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450, 200- 226.
doi: 10.1016/j.ins.2018.03.031
18 LÜ Yi , LIU Mandan , XIANG Yue . Fast searching density peak clustering algorithm based on shared nearest neighbor and adaptive clustering center[J]. Symmetry, 2020, 12 (12): 2014.
doi: 10.3390/sym12122014
19 LIU Yaohui , MA Zhengming , YU Fang . Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J]. Knowledge-based Systems, 2017, 133, 208- 220.
doi: 10.1016/j.knosys.2017.07.010
20 HOU Jian , ZHANG Aihua , QI Naiming . Density peak clustering based on relative density relationship[J]. Pattern Recognition, 2020, 108, 107554.
doi: 10.1016/j.patcog.2020.107554
21 WANG Yizhang , WANG Di , ZHANG Xiaofeng , et al. McDPC: multi-center density peak clustering[J]. Neural Computing and Applications, 2020, 32 (17): 13465- 13478.
doi: 10.1007/s00521-020-04754-5
22 ABDOLLAHZADEH B , GHAREHCHOPOGH F S , MIRJALILI S . African vultures optimization algorithm: a new nature-inspired metaheuristic algorithm for global optimization problems[J]. Computers & Industrial Engineering, 2021, 158, 107408.
23 YANG Xinshe . Nature-inspired metaheuristic algorithms[M]. Beckington: Luniver Press, 2010.
24 DUA D, KARRA TANISKIDOU E. UCI machine learning repository[EB/OL]. (2023-11-13)[2023-11-13]. http://archive.ics.uci.edu/ml.
25 CHANG Hong , YEUNG D Y . Robust path-based spectral clustering[J]. Pattern Recognition, 2008, 41 (1): 191- 203.
doi: 10.1016/j.patcog.2007.04.010
26 GIONIS A , MANNILA H , TSAPARAS P . Clustering aggregation[J]. ACM Transactions on Knowledge Discovery From Data(TKDD), 2007, 1 (1): 4.
doi: 10.1145/1217299.1217303
27 SUO Mingliang , ZHU Baolong , ZHOU Ding , et al. Neighborhood grid clustering and its application in fault diagnosis of satellite power system[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2019, 233 (4): 1270- 1283.
doi: 10.1177/0954410017751991
28 JAIN A K, LAW M H C. Data clustering: a user's dilemma[C]//International Conference on Pattern Recognition and Machine Intelligence. Berlin: Springer, 2005: 1-10.
29 VEENMAN C J , REINDERS M J T , BACKER E . A maximum variance cluster algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24 (9): 1273- 1280.
doi: 10.1109/TPAMI.2002.1033218
30 FU L , MEDICO E . FLAME: a novel fuzzy clustering method for the analysis of DNA microarray data[J]. BMC Bioinformatics, 2007, 8 (1): 1- 15.
doi: 10.1186/1471-2105-8-1
[1] 张一鸣,王国胤,胡军,傅顺. 基于密度峰值和网络嵌入的重叠社区发现[J]. 《山东大学学报(理学版)》, 2021, 56(1): 91-102.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王康 李华. 化学计量学方法用于蛤青注射色谱数据重叠峰的分辨[J]. J4, 2009, 44(11): 16 -20 .
[2] 陈 莉, . 非方广义系统带干扰抑制的奇异LQ次优控制问题[J]. J4, 2006, 41(2): 74 -77 .
[3] 霍玉洪,季全宝. 一类生物细胞系统钙离子振荡行为的同步研究[J]. J4, 2010, 45(6): 105 -110 .
[4] 石长光 . Faddeev模型中的多孤立子解[J]. J4, 2007, 42(7): 38 -40 .
[5] 马继雄,江莉,祁驭矜,向凤宁,夏光敏 . 祁连龙胆愈伤组织和再生植株的生长及其两种药效成分分析[J]. J4, 2006, 41(6): 157 -160 .
[6] 谢涛,左可正. 关于两个幂等算子组合的Drazin逆的若干探讨[J]. J4, 2013, 48(4): 95 -103 .
[7] 王德良,辜娇峰, 何平 . 八大公山红腹角雉对植被因素选择的分析[J]. J4, 2009, 44(3): 17 -21 .
[8] 周伟娜,左连翠*. 几类图的笛卡尔积图的(d,1)-全标号[J]. 山东大学学报(理学版), 2014, 49(04): 24 -28 .
[9] 刘开振 孔祥智. 半超富足半群的结构[J]. J4, 2010, 45(1): 86 -88 .
[10] 娄国久 张兴秋 王建国. Banach空间一阶非线性混合型积微分方程正解的存在性[J]. J4, 2009, 44(8): 74 -79 .