JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (3): 27-36, 50.doi: 10.6040/j.issn.1671-9352.4.2023.040

Previous Articles     Next Articles

Fuzzy border-peeling clustering

Jiarui SUN(),Mingjing DU*()   

  1. School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221100, Jiangsu, China
  • Received:2023-06-02 Online:2024-03-20 Published:2024-03-06
  • Contact: Mingjing DU E-mail:sunjr@jsnu.edu.cn;dumj@jsnu.edu.cn

Abstract:

A fuzzy border-peeling clustering (FBP) algorithm is proposed. First, a density estimation method based on Cauchy kernel is used to calculate the densities of data points. Secondly, the boundary data are separated from the core data using the layer-by-layer peeling strategy. Thirdly, the reachability between the core data is used to achieve the core region clustering. Finally, a fuzzy assignment strategy is used to achieve the soft partitioning of the boundary data. A comparison is made between the fuzzy border-peeling clustering and 10 benchmark algorithms, including 6 density-based clustering algorithms and 4 fuzzy clustering algorithms, on artificial and real-world datasets. The experimental results show that on all datasets, FBP has the ARI (adjusted rand index) increased by 21% to 60% on average, and FBP has the NMI (normalized mutual information) increased by 12% to 47% on average. The border-peeling clustering algorithm optimized based on Cauchy kernel and fuzzy assignment strategy significantly improves the accuracy of clustering.

Key words: density-based clustering, border-peeling clustering, fuzzy clustering, soft clustering, Cauchy kernel function

CLC Number: 

  • TP18

Table 1

Summary of notations"

符号 概念
n 数据点数目
k 近邻数目
t t次剥离
xi 任意数据点
X 数据集合
X(t) t次剥离, 剩余数据集合
NNk(t) (xi) t次剥离, 数据点xi的第k近邻
Nk(t) (xi) t次剥离, 数据点xik近邻集合
RNk(t) (xi) t次剥离, 数据点xi的反k近邻集合

Fig.1

FBP clustering algorithm process"

Table 2

Experimental data sets"

数据集名称 数据量 维数 类数
Jain 373 2 2
DD 1 300 2 3
Pathbased 300 2 3
Handl 715 2 3
SF 4 096 2 4
T4 7 236 2 6
Glass 214 10 6
Control 600 60 6
Dig 1 797 64 10
PageB 5 473 10 5
Optdigits 5 620 64 10
Crowd 10 845 28 6

Fig.2

Clustering results on synthetic datasets"

Table 3

Performance of the evaluated algorithms on synthetic datasets"

AlgorithmJainDD
ARI NMI params ARI NMI params
FBP 1.000 1.000 k=19 0.992 0.980 k=24
BP 0.478 0.677 k=25 0.154 0.530 k=25
DBSCAN 1.000 1.000 ε=3.1, MinPts=28 0.962 0.856 ε=0.1, MinPts=5
HDBSCAN 0.209 0.372 k=5, mc=5 0.527 0.488 k=28, mc=20
DPC 0.334 0.617 dc=6%, α=2.5 0.256 0.619 dc=4%, α=2
DPC-CE 0.304 0.609 dc=2%, α=2.5 0.214 0.589 dc=4%, α=2
VDPC 1.000 1.000 dc=5%, δ=5 0.302 0.509 dc=3%, δ=1
FCM 0.318 0.367 M=1.5 0.213 0.277 M=4
kFCM 0.239 0.323 M=3.5, δ=2 0.204 0.299 M=2, δ=2
MkFC 0.421 0.448 M=1.5 0.221 0.245 M=3.5
PCM 0.272 0.362 M=3.5 0.211 0.287 M=2
AlgorithmPathbasedHandl
ARI NMI paraMs ARI NMI params
FBP 1.000 1.000 k=9 0.988 0.974 k=30
BP 0.980 0.969 k=9 0.659 0.785 k=30
DBSCAN 0.901 0.871 ε=3.35, MinPts=30 0.720 0.668 ε=0.1, MinPts=5
HDBSCAN 0.016 0.126 k=5, mc=20 0.250 0.358 k=9, mc=15
DPC 0.342 0.556 dc=2%, α=0.5 0.844 0.828 dc=5%, α=1.5
DPC-CE 0.520 0.611 dc=4%, α=1 0.762 0.818 dc=8%, α=2.5
VDPC 0.661 0.707 dc=2%, δ=2 0.762 0.818 dc=5%, δ=1
FCM 0.465 0.550 M=1.5 0.625 0.654 M=1.5
KFCM 0.454 0.545 M=1.5, δ=1 0.560 0.672 M=3, δ=0.5
MKFC 0.426 0.497 M=4 0.568 0.623 M=2
PCM 0.409 0.513 M=4 0.330 0.437 M=4
AlgorithmSFT4
ARI NMI params ARI NMI params
FBP 0.978 0.953 k=23 1.000 1.000 k=26
BP 0.201 0.599 k=30 0.361 0.737 k=50
DBSCAN 0.135 0.490 ε=0.1, MinPts=5 0.999 0.997 ε=0.1, MinPts=16
HDBSCAN 0.127 0.304 k=21, mc=20 0.271 0.518 k=46, mc=20
DPC 0.553 0.754 dc=4%, α=2.5 0.828 0.883 dc=5%, α=2.5
DPC-CE 0.346 0.662 dc=6%, α=2.5 0.541 0.798 dc=5%, α=2.5
VDPC 0.894 0.898 dc=2%, δ=2 0.724 0.826 dc=2%, δ=1
FCM 0.452 0.495 M=1.5 0.634 0.724 M=4
KFCM 0.542 0.560 M=2.5, δ=1.5 0.644 0.734 M=3, δ=1.5
MKFC 0.423 0.474 M=4 0.432 0.533 M=4
PCM 0.329 0.416 M=4 0.372 0.601 M=2.5

Table 4

Performance of the evaluated algorithms on real-world datasets"

AlgorithmGlassControl
ARI NMI params ARI NMI params
FBP 0.693 0.769 k=6 0.680 0.856 k=19
BP 0.393 0.670 k=5 0.626 0.805 k=10
DBSCAN 0.575 0.678 ε=4.35, MinPts=9 0.136 0.480 ε=1.35, MinPts=30
HDBSCAN 0.255 0.370 k=5, mc=20 0.236 0.476 k=13, mc=20
DPC 0.672 0.753 dc=10%, α=2.5 0.540 0.741 dc=10%, α=2.5
DPC-CE 0.642 0.765 dc=8%, α=2 0.543 0.744 dc=9%, α=2.5
VDPC 0.363 0.713 dc=5%, δ=5 0.554 0.681 dc=10%, δ=1
FCM 0.570 0.762 m=2 0.619 0.732 m=1.5
KFCM 0.201 0.467 m=1.5, δ=2 0.660 0.754 m=1.5, δ=2
MKFC 0.242 0.344 m=1.5 0.646 0.763 m=1.5
PCM 0.282 0.464 m=2 0.536 0.686 m=1.5
AlgorithmDigPageB
ARI NMI params ARI NMI params
FBP 0.795 0.858 k=9 0.413 0.305 k=20
BP 0.397 0.698 k=10 0.095 0.210 k=23
DBSCAN 0.557 0.741 ε=1.35, MinPts=4 0.434 0.248 ε=0.1, MinPts=50
HDBSCAN 0.355 0.675 k=2, mc=5 0.318 0.130 k=29, mc=20
DPC 0.781 0.849 dc=2%, α=2.5 0.335 0.213 dc=8%, α=1
DPC-CE 0.714 0.826 dc=4%, α=2.5 0.384 0.255 dc=8%, α=1
VDPC 0.180 0.597 dc=2%, δ=2 0.012 0.050 dc=10%, δ=1
FCM 0.251 0.464 M=1.5 0.091 0.142 M=1.5
KFCM 0.452 0.559 M=2.5, δ=1 0.076 0.147 M=1.5, δ=1.5
MKFC 0.000 0.000 M=1.5 0.049 0.090 M=1.5
PCM 0.254 0.432 M=2.5 0.036 0.052 M=1.5
AlgorithmOptdigitsCrowd
ARI NMI params ARI NMI params
FBP 0.832 0.877 k=12 0.425 0.403 k=46
BP 0.095 0.210 k=23 0.354 0.349 k=46
DBSCAN 0.532 0.732 ε=1.35, MinPts=23 0.263 0.339 ε=0.35, MinPts=35
HDBSCAN 0.409 0.653 k=2, mc=5 0.285 0.347 k=41, mc=5
DPC 0.736 0.808 dc=2%, α=2.5 0.006 0.365 dc=3%, α=1
DPC-CE 0.723 0.812 dc=4%, α=2.5 0.031 0.371 dc=2%, α=1
VDPC 0.141 0.576 dc=2%, δ=2 0.000 0.000 dc=2%, δ=2
FCM 0.249 0.408 M=3.5 0.106 0.242 M=1.5
KFCM 0.356 0.528 M=2.5, δ=1 0.121 0.246 M=1.5, δ=2
MKFC 0.000 0.000 M=1.5 0.174 0.258 M=1.5
PCM 0.246 0.414 M=2 0.171 0.114 M=1.5
1 徐晓, 丁世飞, 丁玲. 密度峰值聚类算法研究进展[J]. 软件学报, 2022, 33 (5): 1800- 1816.
XU Xiao , DING Shifei , DING Ling . Survey on density peaks clustering algorithm[J]. Journal of Software, 2022, 33 (5): 1800- 1816.
2 PENG D , GUI Z , WANG D , et al. Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity[J]. Nature Communications, 2022, 13 (1): 5455.
doi: 10.1038/s41467-022-33136-9
3 ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Menlo Park: AAAI Press, 1996: 226-231.
4 CAMPELLO R J, MOULAVI D, SANDER J. Density-based clustering based on hierarchical density estimates[C]//Proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2013: 160-172.
5 RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks[J]. Science, 2014, 344 (6191): 1492- 1496.
doi: 10.1126/science.1242072
6 CHEN H , LIANG M , LIU W , et al. An approach to boundary detection for 3D point clouds based on dbscan clustering[J]. Pattern Recognition, 2022, 124, 108431.
doi: 10.1016/j.patcog.2021.108431
7 OUYANG T , PEDRYCZ W , PIZZI N J . Rule-based modeling with dbscan-based information granules[J]. IEEE Transactions on Cybernetics, 2021, 51 (7): 3653- 3663.
doi: 10.1109/TCYB.2019.2902603
8 LU J , ZHAO Y , TAN K L , et al. Distributed density peaks clustering revisited[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34 (8): 3714- 3726.
doi: 10.1109/TKDE.2020.3034611
9 RASOOL Z , ZHOU R , CHEN L , et al. Index-based solutions for efficient density peak clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34 (5): 2212- 2226.
doi: 10.1109/TKDE.2020.3004221
10 盛锦超, 杜明晶, 李宇蕊, 等. 结合柯西核的分类型数据密度峰值聚类算法[J]. 计算机工程与应用, 2022, 58 (18): 162- 171.
SHENG Jingchao , DU Mingjing , LI Yurui , et al. Cauchy kernel-based density peaks clustering algorithm for categorical data[J]. Computer Engineering and Applications, 2022, 58 (18): 162- 171.
11 AVERBUCH-ELOR H , BAR N , COHEN-OR D . Border-peeling clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (7): 1791- 1797.
doi: 10.1109/TPAMI.2019.2924953
12 ANKERST M, BREUNIG M M, KRIEGEL H P, et al. Optics: ordering points to identify the clustering structure[C]//Proceedings of the 1999 International Conference on Management of Data. New York: ACM, 1999: 49-60.
13 GUO W , WANG W , ZHAO S , et al. Density peak clustering with connectivity estimation[J]. Knowledge-Based Systems, 2022, 243, 108501.
doi: 10.1016/j.knosys.2022.108501
14 WANG Y , WANG D , ZHOU Y , et al. VDPC: variational density peak clustering algorithm[J]. Information Sciences, 2023, 621, 627- 651.
doi: 10.1016/j.ins.2022.11.091
15 DU M , WANG R , JI R , et al. Robp a robust border-peeling clustering using Cauchy kernel[J]. Information Sciences, 2021, 571, 375- 400.
doi: 10.1016/j.ins.2021.04.089
16 陈延伟, 赵兴旺. 基于边界点检测的变密度聚类算法[J]. 计算机应用, 2022, 42 (8): 2450- 2460.
CHEN Yanwei , ZHAO Xingwang . Varied density clustering algorithm based on border point detection[J]. Journal of Computer Applications, 2022, 42 (8): 2450- 2460.
17 张柏恺, 杨德刚, 冯骥. 一种去除聚类数量k和邻域参数c设置的自适应聚类算法[J]. 计算机工程与科学, 2021, 43 (10): 1838- 1847.
doi: 10.3969/j.issn.1007-130X.2021.10.018
ZHANG Bokai , YANG Degang , FENG Ji . A self-adaptive clustering algorithm without neighborhood parameter k and cluster number c[J]. Computer Engineering & Science, 2021, 43 (10): 1838- 1847.
doi: 10.3969/j.issn.1007-130X.2021.10.018
18 DUNN J C . A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters[J]. Journal of Cybernetics, 1973, 3, 32- 57.
19 KARAYIANNIS N B. Meca: maximum entropy clustering algorithm[C]//Proceedings of 3rd IEEE International Conference on Fuzzy Systems. Piscataway: IEEE, 1994: 630-635.
20 GAN G, WU J, YANG Z. A fuzzy subspace algorithm for clustering high dimensional data[C]//Proceedings of 2nd International Conference on Advanced Data Mining and Applications. Berlin: Springer, 2006: 271-278.
21 FAZENDEIRO P , DE OLIVEIRA J V . Observer-biased fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2014, 23 (1): 85- 97.
22 KRISHNAPURAM R , KELLER J M . The possibilistic c-means algorithm: insights and recommendations[J]. IEEE Transactions on Fuzzy Systems, 1996, 4 (3): 385- 393.
23 ZHANG D Q , CHEN S C . Clustering incomplete data using kernel-based fuzzy c-means algorithm[J]. Neural Processing Letters, 2003, 18 (3): 155- 162.
24 HUANG H C , CHUANG Y Y , CHEN C S . Multiple kernel fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2011, 20 (1): 120- 134.
25 DING S , DU M , SUN T , et al. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood[J]. Knowledge-Based Systems, 2017, 133, 294- 313.
[1] ZHANG Cong, YU Hong. An incremental three-way decisions soft clustering algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(08): 40-47.
[2] WAN Run-ze1, LEI Jian-jun1, YUAN Cao2. An optimization strategy for managing dormant nodes in wireless sensor networks base on fuzzy clustering [J]. J4, 2013, 48(09): 17-21.
[3] FU Xue-feng,LIU Qiu-yun,WANG Ming-wen . Rough sets information retrieval model based on multual information [J]. J4, 2006, 41(3): 116-119 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LIANG Xiao, WANG Linshan. Global attractor of a class of recurrent neural network with Stype distributed delays[J]. J4, 2009, 44(4): 57 -60 .
[2] DONG Xin-mei . On problems of Suryanarayana[J]. J4, 2007, 42(2): 83 -86 .
[3] SONG Yu-dan, WANG Shi-tong*. Minimum within-class variance SVM with absent features[J]. J4, 2010, 45(7): 102 -107 .
[4] SHI Yan-hua1, SHI Dong-yang2*. The quasi-Wilson nonconforming finite element approximation to  pseudo-hyperbolic equations[J]. J4, 2013, 48(4): 77 -84 .
[5] LIU Ru-jun,CAO Yu-xia,ZHOU Ping . Anti-control for discrete chaos systems by small feedback[J]. J4, 2007, 42(7): 30 -32 .
[6] ZHANG Xue-feng1, LIU Peng1,2. An improved K-means algorithm by weighted distance based on maximum between-cluster variation[J]. J4, 2010, 45(7): 28 -33 .
[7] WU Da-hua, HE Zhen-feng*. Improvement of cluster-based genetic segmentation of time series algorithm[J]. J4, 2010, 45(7): 45 -49 .
[8] WANG Gui-Jie, WANG Wen-De, TAO De-Li. Study and application of CAD technology in grade crossing vertical design[J]. J4, 2009, 44(11): 79 -82 .
[9] XU Chun-hua,GAO Bao-yu,LU Lei,XU Shi-ping,CAO Bai-chuan,YUE Qin-yan and ZHANG Jian . Study of chemically enhanced primary treatment of wastewater received by urban rivers[J]. J4, 2006, 41(2): 116 -120 .
[10] CHEN Hong-yu1, ZHANG Li2. The linear 2-arboricity of planar graphs without 5-, 6-cycles with chord[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(06): 26 -30 .