JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (1): 46-55,71.doi: 10.6040/j.issn.1671-9352.0.2022.623

Previous Articles     Next Articles

Improved density peak clustering approach based on African vultures optimization algorithm

Xinglong LUO1(),Xingshi HE1,*(),Jie ZHOU1,Xinshe YANG2   

  1. 1. School of Science, Xi'an Polytechnic University, Xi'an 710600, Shannxi, China
    2. School of Science and Technology, Middlesex University, Cambridge CB2 1TN, UK
  • Received:2022-11-17 Online:2024-01-20 Published:2024-01-19
  • Contact: Xingshi HE E-mail:lxl_bert@163.com;xsh1002@126.com

Abstract:

Density peak clustering algorithm is a new fast search algorithm for automatically finding cluster centers. Aiming at the uncertainty of its cut-off distance and the instability of the one-step allocation strategy, an improved density peak clustering approach based on African vultures optimization algorithm is proposed. The objective function of the optimization problem is established through evaluating accuracy(Acc), and the uncertain cut-off distance dc is optimized by the powerful optimization ability of the African vultures optimization algorithm, which reduces the inaccuracy of artificial values. Secondly, according to the average density of the data set, it is divided into different density areas, and different allocation strategies are used for different areas. For data points in the high-density area, the same allocation method as the original density peak clustering is used, and for data points in the low-density area, the k-nearest neighbor method is used for clustering. Finally, the algorithm is experimentally verified on synthetic and real data sets, the clustering performance of the algorithm has been greatly improved, and the division of data sets with large density differences is also more accurate.

Key words: African vultures optimization algorithm, density peak, cut-off distance, allocation strategy

CLC Number: 

  • TP301

Fig.1

Clustering effects different dc on DPC"

Table 1

Information for all datasets"

类型 数据集名称 样本量 维度
合成数据集 Flame 240 2 2
Aggregation 788 2 7
Spiral 312 2 3
D31 3 100 2 31
Jain 373 2 2
R15 600 2 15
Three-circle 299 2 3
Compound 399 2 6
A3 266 2 3
真实数据集 Ionosphere 351 34 2
Libras 360 90 15
Seeds 210 7 3
Iris 150 4 3
Ecoli 336 8 8
Wine 178 13 3
Pima 768 8 2
WDBC 569 30 2
Waveform (noise) 5 000 40 3

Table 2

Ami, Ari and Fmi indices on the artificial dataset"

算法 Flame Aggregation Spiral
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.405 9 0.453 5 0.736 1 0.774 8 0.673 5 0.742 9 -0.005 7 -0.006 3 0.328 1
DPC 1.000 0 1.000 0 1.000 0 0.992 1 0.995 6 0.996 5 1.000 0 1.000 0 1.000 0
DBSCAN 0.897 5 0.938 8 0.971 2 0.911 4 0.910 2 0.932 7 1.000 0 1.000 0 1.000 0
DPCSA 1.000 0 1.000 0 1.000 0 0.953 7 0.958 1 0.967 3 1.000 0 1.000 0 1.000 0
DPC-AVOA 1.000 0 1.000 0 1.000 0 0.991 1 0.9942 0.9954 1.000 0 1.000 0 1.000 0
算法 D31 Jain R15
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.941 6 0.898 9 0.902 3 0.404 4 0.413 4 0.742 5 0.981 0 0.973 2 0.975 0
DPC 0.954 6 0.934 5 0.936 6 0.503 7 0.569 1 0.815 9 0.993 7 0.992 7 0.993 2
DBSCAN 0.889 5 0.807 8 0.818 6 0.885 1 0.975 8 0.990 6 0.982 2 0.981 9 0.983 1
DPCSA 0.955 2 0.935 3 0.937 4 0.216 7 0.044 2 0.592 4 0.988 5 0.985 7 0.986 6
DPC-AVOA 0.957 9 0.9407 0.942 6 1.000 0 1.000 0 1.000 0 0.990 7 0.989 1 0.989 8
算法 Three-circle Compound A3
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.2232 0.131 9 0.437 7 0.657 3 0.528 9 0.634 9 0.5792 0.481 0 0.661 1
DPC 1.000 0 1.000 0 1.000 0 0.775 4 0.591 0 0.687 6 0.582 8 0.487 5 0.665 0
DBSCAN 1.000 0 1.000 0 1.000 0 0.7340 0.802 9 0.862 7 0.858 7 0.866 1 0.914 4
DPCSA 0.604 3 0.515 1 0.729 6 0.8392 0.828 4 0.870 7 1.000 0 1.000 0 1.000 0
DPC-AVOA 1.000 0 1.000 0 1.000 0 0.803 1 0.830 5 0.879 6 1.000 0 1.000 0 1.000 0

Table 3

Ami, Ari and Fmi indices for real-world datasets"

算法 Ionosphere Libras Seeds
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.068 2 0.100 7 0.568 8 0.509 4 0.291 5 0.342 4 0.667 3 0.695 4 0.796 4
DPC 0.089 7 0.143 3 0.595 3 0.38 7 0.207 9 0.301 4 0.714 4 0.744 7 0.829 6
DBSCAN 0.5520 0.683 5 0.857 5 0.128 3 0.028 4 0.242 7 0.401 6 0.402 5 0.635 4
DPCSA 0.135 5 0.218 3 0.643 2 0.493 9 0.268 3 0.357 2 0.660 9 0.687 3 0.791 8
DPC-AVOA 0.372 1 0.503 4 0.782 9 0.584 2 0.381 3 0.439 5 0.807 3 0.850 6 0.900 0
算法 Iris Ecoli Wine
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.757 3 0.743 6 0.829 4 0.486 8 0.374 9 0.516 7 0.856 4 0.880 3 0.920 5
DPC 0.766 8 0.719 5 0.815 8 0.517 8 0.436 5 0.569 2 0.706 4 0.672 3 0.783 4
DBSCAN 0.676 8 0.6120 0.729 1 0.451 6 0.525 5 0.662 3 0.434 7 0.418 7 0.623 4
DPCSA 0.883 1 0.903 8 0.935 5 0.4170 0.478 3 0.674 2 0.74 8 0.741 4 0.828 3
DPC-AVOA 0.897 1 0.903 9 0.935 6 0.4730 0.547 4 0.713 3 0.843 1 0.866 6 0.911 3
算法 WDBC Pima Waveform (noise)
Ami Ari Fmi Ami Ari Fmi Ami Ari Fmi
K-means 0.637 5 0.754 8 0.887 6 0.046 3 0.087 1 0.576 6 0.361 1 0.251 5 0.502 3
DPC 0.008 6 -0.010 7 0.715 5 0.034 1 0.0780 0.588 8 0.087 8 0.067 1 0.458 5
DBSCAN 0.333 4 0.453 9 0.747 4 0.011 3 0.041 5 0.693 6 0.000 0 0.000 0 0.577 3
DPCSA 0.336 1 0.377 1 0.759 5 0.001 7 0.014 3 0.711 9 0.152 4 0.134 9 0.462 3
DPC-AVOA 0.448 7 0.523 1 0.801 9 0.047 9 0.093 9 0.709 4 0.274 7 0.255 5 0.507 2

Fig.2

Cluster plot of DPC-AVOA, DBSCAN, DPC, and K-means on Jain database"

Fig.3

Cluster plot of DPC-AVOA, DBSCAN, DPC, and K-means on three-circle database"

Fig.4

Cluster plot of DPC-AVOA, DBSCAN, DPC, and K-means on A3 database"

1 STREHL A , GHOSH J . Relationship-based clustering and visualization for high-dimensional data mining[J]. INFORMS Journal on Computing, 2003, 15 (2): 208- 230.
doi: 10.1287/ijoc.15.2.208.14448
2 DONG Gaogao , TIAN Lixin , DU Ruijin , et al. Analysis of percolation behaviors of clustered networks with partial support-dependence relations[J]. Physica A: Statistical Mechanics and its Applications, 2014, 394, 370- 378.
doi: 10.1016/j.physa.2013.09.055
3 RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks[J]. Science, 2014, 344 (6191): 1492- 1496.
doi: 10.1126/science.1242072
4 CHEN Yewang , LAI Dehe , QI Han , et al. A new method to estimate ages of facial image for large database[J]. Multimedia Tools and Applications, 2016, 75 (5): 2877- 2895.
doi: 10.1007/s11042-015-2485-9
5 XIE Juanying , GAO Hongchao , XIE Weixin , et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354, 19- 40.
doi: 10.1016/j.ins.2016.03.011
6 MEHMOOD R , ZHANG Guangzhi , BIE Rongfang , et al. Clustering by fast search and find of density peaks via heat diffusion[J]. Neurocomputing, 2016, 208, 210- 217.
doi: 10.1016/j.neucom.2016.01.102
7 TANG Guihua, JIA Sen, LI Jun. An enhanced density peak-based clustering approach for hyperspectral band selection[C]//2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). New York: IEEE, 2015: 1116-1119.
8 WANG Shuliang , WANG Dakui , LI Caoyuan , et al. Clustering by fast search and find of density peaks with data field[J]. Chinese Journal of Electronics, 2016, 25 (3): 397- 402.
doi: 10.1049/cje.2016.05.001
9 JIANG Dong , ZANG Wenke , SUN Rui , et al. Adaptive density peaks clustering based on K-nearest neighbor and Gini coefficient[J]. IEEE Access, 2020, 8, 113900- 113917.
doi: 10.1109/ACCESS.2020.3003057
10 DU Mingjing , DING Shifei , XU Xiao , et al. Density peaks clustering using geodesic distances[J]. International Journal of Machine Learning and Cybernetics, 2018, 9 (8): 1335- 1349.
doi: 10.1007/s13042-017-0648-x
11 WANG Xiaofeng , XU Yifan . Fast clustering using adaptive density peak detection[J]. Statistical Methods in Medical Research, 2017, 26 (6): 2800- 2811.
doi: 10.1177/0962280215609948
12 DU Mingjing , DING Shifei , JIA Hongjie . Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J]. Knowledge-Based Systems, 2016, 99, 135- 145.
doi: 10.1016/j.knosys.2016.02.001
13 丁志成, 葛洪伟, 周竞. 基于KL散度的密度峰值聚类算法[J]. 重庆邮电大学学报(自然科学版), 2019, 31 (3): 367- 374.
DING Zhicheng , GE Hongwei , ZHOU Jing . Density peaks clustering based on Kullback Leibler divergence[J]. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 2019, 31 (3): 367- 374.
14 谢娟英, 高红超, 谢维信. K近邻优化的密度峰值快速搜索聚类算法[J]. 中国科学(信息科学), 2016, 46 (2): 258- 280.
XIE Juanying , GAO Hongchao , XIE Weixin . K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[J]. Science China(Information Science), 2016, 46 (2): 258- 280.
15 YU Donghua , LIU Guojun , GUO Maozu , et al. Density peaks clustering based on weighted local density sequence and nearest neighbor assignment[J]. IEEE Access, 2019, 7, 34301- 34317.
doi: 10.1109/ACCESS.2019.2904254
16 SEYEDI S A , LOTFI A , MORADI P , et al. Dynamic graph-based label propagation for density peaks clustering[J]. Expert Systems with Applications, 2019, 115, 314- 328.
doi: 10.1016/j.eswa.2018.07.075
17 LIU Rui , WANG Hong , YU Xiaomei . Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450, 200- 226.
doi: 10.1016/j.ins.2018.03.031
18 LÜ Yi , LIU Mandan , XIANG Yue . Fast searching density peak clustering algorithm based on shared nearest neighbor and adaptive clustering center[J]. Symmetry, 2020, 12 (12): 2014.
doi: 10.3390/sym12122014
19 LIU Yaohui , MA Zhengming , YU Fang . Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J]. Knowledge-based Systems, 2017, 133, 208- 220.
doi: 10.1016/j.knosys.2017.07.010
20 HOU Jian , ZHANG Aihua , QI Naiming . Density peak clustering based on relative density relationship[J]. Pattern Recognition, 2020, 108, 107554.
doi: 10.1016/j.patcog.2020.107554
21 WANG Yizhang , WANG Di , ZHANG Xiaofeng , et al. McDPC: multi-center density peak clustering[J]. Neural Computing and Applications, 2020, 32 (17): 13465- 13478.
doi: 10.1007/s00521-020-04754-5
22 ABDOLLAHZADEH B , GHAREHCHOPOGH F S , MIRJALILI S . African vultures optimization algorithm: a new nature-inspired metaheuristic algorithm for global optimization problems[J]. Computers & Industrial Engineering, 2021, 158, 107408.
23 YANG Xinshe . Nature-inspired metaheuristic algorithms[M]. Beckington: Luniver Press, 2010.
24 DUA D, KARRA TANISKIDOU E. UCI machine learning repository[EB/OL]. (2023-11-13)[2023-11-13]. http://archive.ics.uci.edu/ml.
25 CHANG Hong , YEUNG D Y . Robust path-based spectral clustering[J]. Pattern Recognition, 2008, 41 (1): 191- 203.
doi: 10.1016/j.patcog.2007.04.010
26 GIONIS A , MANNILA H , TSAPARAS P . Clustering aggregation[J]. ACM Transactions on Knowledge Discovery From Data(TKDD), 2007, 1 (1): 4.
doi: 10.1145/1217299.1217303
27 SUO Mingliang , ZHU Baolong , ZHOU Ding , et al. Neighborhood grid clustering and its application in fault diagnosis of satellite power system[J]. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2019, 233 (4): 1270- 1283.
doi: 10.1177/0954410017751991
28 JAIN A K, LAW M H C. Data clustering: a user's dilemma[C]//International Conference on Pattern Recognition and Machine Intelligence. Berlin: Springer, 2005: 1-10.
29 VEENMAN C J , REINDERS M J T , BACKER E . A maximum variance cluster algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24 (9): 1273- 1280.
doi: 10.1109/TPAMI.2002.1033218
30 FU L , MEDICO E . FLAME: a novel fuzzy clustering method for the analysis of DNA microarray data[J]. BMC Bioinformatics, 2007, 8 (1): 1- 15.
doi: 10.1186/1471-2105-8-1
[1] Yi-ming ZHANG,Guo-yin WANG,Jun HU,Shun FU. Overlapping community detection based on density peaks and network embedding [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(1): 91-102.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Kang, LI Hua. Analysis of the compound Haqing injection with hyphenated chromatography and chemometric resolution[J]. J4, 2009, 44(11): 16 -20 .
[2] CHEN Li, . Singular LQ suboptimal control problem with disturbance rejection[J]. J4, 2006, 41(2): 74 -77 .
[3] HUO Yu-hong, JI Quan-bao. Synchronization analysis of oscillatory activities in a biological cell system[J]. J4, 2010, 45(6): 105 -110 .
[4] SHI Chang-guang . Multi-soliton solution of the Faddeev model[J]. J4, 2007, 42(7): 38 -40 .
[5] MA Jie-xiong,JIANG Li,QI Yu-yu,XIANG FEng-ning,XIA Guang-min . The growth of calli and regenerated plantlets of Gentiana Przewalskii Maxim. and the constituents analysis of its two effective components[J]. J4, 2006, 41(6): 157 -160 .
[6] XIE Tao, ZUO Ke-zheng. [J]. J4, 2013, 48(4): 95 -103 .
[7] WANG Deliang, GU Jiaofeng, HE Ping. [J]. J4, 2009, 44(3): 17 -21 .
[8] ZHOU Wei-na, ZUO Lian-cui*. A(d,1)-total labeling of Cartesian products of some classes of graphs#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(04): 24 -28 .
[9] LIU Kai-Zhen, KONG Xiang-Zhi. The structure of semi-superabundant semigroups[J]. J4, 2010, 45(1): 86 -88 .
[10] LOU Guo-Jiu, ZHANG Xin-Qiu, WANG Jian-Guo. Existence of a positive solution for first order nonlinear impulsive singular differential equations of mixed type in Banach spaces[J]. J4, 2009, 44(8): 74 -79 .