JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2025, Vol. 60 ›› Issue (1): 29-44.doi: 10.6040/j.issn.1671-9352.7.2023.4130

Previous Articles    

Density peak clustering algorithm optimized by natural neighbor search

ZHANG Chunhao1, XIE Bin1,2,3*, XU Tongtong1, ZHANG Ximei1   

  1. 1. College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, Hebei, China;
    2. Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics &
    Data Security, Hebei Normal University, Shijiazhuang 050024, Hebei, China;
    3. Hebei Provincial Key Laboratory of Network &
    Information Security, Hebei Normal University, Shijiazhuang 050024, Hebei, China
  • Published:2025-01-10

Abstract: We combine the natural neighbor search algorithm to improve a series of problems of the density peaks clustering(CFSFDP)algorithm, and propose the NaN-CFSFDP algorithm. First, an outlier samples detection method is proposed based on the natural neighbor search algorithm. Then, for the problem that the truncation distance dc is difficult to be taken accurately manually in the CFSFDP algorithm, the calculation of dc is improved in combination with the natural neighbor search algorithm, and the automatic taking of dc is realized. The metric rule of the sample density of the CFSFDP algorithm is redesigned and unified to make it pay more attention to the local information of each sample. Finally, to address the problem that the density peak points in the dataset may be concentrated in dense clusters due to the large density difference between clusters, which leads to cluster loss, the concepts of shared natural neighbors for samples and shared natural neighbors for clusters are proposed to construct a new cluster fusion algorithm. Experimental results on synthetic and real datasets show that the algorithm outperforms or is at least comparable to the comparative method in terms of clustering performance in most cases and has fewer parameters compared to CFSFDP algorithm and its improvements.

Key words: density peaks, natural neighbor, clustering, cluster fusion, outlier samples, truncation distance

CLC Number: 

  • TP391
[1] OKTAR Y, TURKAN M. A review of sparsity-based clustering methods[J]. Signal Processing, 2018, 148:20-30.
[2] SAXENA A, PRASAD M, GUPTA A, et al. A review of clustering techniques and developments[J]. Neurocomputing, 2017, 267:664-681.
[3] MACQUEEN J B. Some methods for classification and analysis of multivariate observations[C] //Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967:281-297.
[4] BEZDEK J C, EHRLICH R, FULL W. FCM: the fuzzy c-means clustering algorithm[J]. Computers and Geosciences, 1984, 10(2/3):191-203.
[5] WANG Wei, YANG Jiong, MUNTZ R. STING: a statistical information grid approach to spatial data mining[C] //Proceedings of the 23rd International Conference on Very Large Data Bases. Athens: ACM, 1997:186-195.
[6] GUHA S, RASTOGI R, SHIM K. Cure: an efficient clustering algorithm for large databases[J]. Information Systems, 2001, 26(1):35-58.
[7] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C] //Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Portland Oregon, AAAI, 1996:226-231.
[8] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[9] 谢娟英,高红超,谢维信. K近邻优化的密度峰值快速搜索聚类算法[J]. 中国科学(信息科学), 2016, 46(2):258-280. XIE Juanying, GAO Hongchao, XIE Weixin. K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset[J]. Scientia Sinica(Informationis), 2016, 46(2):258-280.
[10] XIE Juanying, GAO Hongchao, XIE Weixin, et al. Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J]. Information Sciences, 2016, 354:19-40.
[11] JIANG Jianhua, CHEN Yujun, MENG Xianqiu, et al. A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process[J]. Physica A, 2019, 523:702-713.
[12] ZHANG Rui, DU Tao, QU Shouning, et al. Adaptive density-based clustering algorithm with shared KNN conflict game[J]. Information Sciences, 2021, 565:344-369.
[13] LIU Yaohui, MA Zhengming, YU Fang. Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy[J]. Knowledge-based Systems, 2017, 133:208-220.
[14] BAI Liang, CHENG Xueqi, LIANG Jiye, et al. Fast density clustering strategies based on the k-means algorithm[J]. Pattern Recognition, 2017, 71:375-386.
[15] LIU Rui, WANG Hong, YU Xiaomei. Shared-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Information Sciences, 2018, 450:200-226.
[16] 张新元,贠卫国. 共享K近邻和多分配策略的密度峰值聚类算法[J]. 小型微型计算机系统, 2023, 44(1):75-82. ZHANG Xinyuan, YUN Weiguo. Sharing K-nearest neighbors and multiple assignment policies density peaks clustering algorithm[J]. Journal of Chinese Computer Systems, 2023, 44(1):75-82.
[17] ZHANG Chunhao, XIE Bin, ZHANG Yiran. Reverse-nearest-neighbor-based clustering by fast search and find of density peaks[J]. Chinese Journal of Electronics, 2023, 32(6):1341-1354.
[18] 徐童童,解滨,张喜梅,等. 自适应聚类中心策略优化的密度峰值聚类算法[J]. 计算机工程与应用,2023,59(21):91-101. XU Tongtong, XIE Bin, ZHANG Ximei, et al. Density peak clustering algorithm optimized by adaptive clustering centers strategy[J]. Computer Engineering and Applications, 2023, 59(21):91-101.
[19] 张春昊,解滨,张喜梅,等. 一种结合自适应近邻与密度峰值的加权模糊聚类算法[J]. 小型微型计算机系统,2023, 44(9):1974-1982. ZHANG Chunhao, XIE Bin, ZHANG Ximei, et al. Weighted fuzzy clustering algorithm combining adaptive nearest neighbors and density peaks[J]. Journal of Chinese Computer Systems, 2023, 44(9):1974-1982.
[20] CHEN Yewang, TANG Shengyu, PEI Songwen. DHeat: a density heat-based algorithm for clustering with effective radius[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 48(4):649-660.
[21] BRYANT A, CIOS K. RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 30(6):1109-1121.
[22] ZHU Qingsheng, FENG Ji, HUANG Jinlong. Natural neighbor: a self-adaptive neighborhood method without parameter K[J]. Pattern Recognition Letters, 2016, 80:30-36.
[23] HUANG Jinlong, ZHU Qingsheng, YANG Lijun, et al. A non-parameter outlier detection algorithm based on natural neighbor[J]. Knowledge-based Systems, 2016, 92:71-77.
[24] YANG Lijun, ZHU Qingsheng, HUANG Jinlong, et al. Adaptive edited natural neighbor algorithm[J]. Neurocomputing, 2017, 230:427-433.
[25] CHENG Dongdong, ZHU Qingsheng, HUANG Jinlong, et al. Natural neighbor-based clustering algorithm with local representatives[J]. Knowledge-based Systems, 2017, 123:238-253.
[26] 陈叶旺,申莲莲,钟才明,等. 密度峰值聚类算法综述[J]. 计算机研究与发展, 2020, 57(2):378-394. CHEN Yewang, SHEN Lianlian, ZHONG Caiming, et al. Survey on density peak clustering algorithm[J]. Journal of Computer Research and Development, 2020, 57(2):378-394.
[27] FU Limin, MEDICO E. Flame: a novel fuzzy clustering method for the analysis of DNA microarray data[J]. BMC Bioinformatics, 2007, 8(3):1-15.
[28] GIONIS A, MANNILA H, TSAPARAS P. Clustering aggregation[J]. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1):1-30.
[29] JAIN A K, LAW M H C. Data clustering: a users dilemma[C] //International Conference on Pattern Recognition & Machine Intelligence. Kolkata: Springer, 2005:1-10.
[30] DING Shifei, DU Mingjing, SUN Tongfeng, et al. An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood[J]. Knowledge-based Systems, 2017, 133:294-313.
[31] HUBERT L, ARABIE P. Comparing partitions[J]. Journal of Classification, 1985, 2:193-218.
[32] CHANG H, YEUNG D Y. Robust path-based spectral clustering[J]. Pattern Recognition, 2008, 41(1):191-203.
[33] VEENMAN C J, REINDERS M J T, BACKER E. A maximum variance cluster algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(9):1273-1280.
[34] ZAHN C T. Graph-theoretical methods for detecting and describing gestalt clusters[J]. IEEE Transactions on Computers, 1971, C-20(1):68-86.
[35] REZAE M, FRANTI P. Set matching measures for external cluster validity[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8):2173-2186.
[1] ZHU Jin, FU Yu, GUAN Wenrui, WANG Pingxin. Perturbation three-way clustering based on natural nearest neighbors [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 45-51.
[2] ZHENG Chenying, CHEN Yingyue, HOU Xianyu, JIANG Lianji, LIAO Liang. A neighbourhood granular fuzzy C-means clustering algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 35-44.
[3] Jiarui SUN,Mingjing DU. Fuzzy border-peeling clustering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 27-36, 50.
[4] Huachang XU,Qian XU,Yulin ZHAO,Fengning LIANG,Kai XU,Hong ZHU. Prediction method of IDH1 mutation status of glioma based on improved EfficientNetV2 [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(7): 60-66.
[5] Hui MA,Lili WEI. Cluster analysis based on the hesitation triangle fuzzy correlation coefficient [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 118-126.
[6] FAN Jia-chen, WANG Ping-xin, YANG Xi-bei. Density-sensitive spectral clustering based on three-way decision [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(1): 59-66.
[7] LI Xin-yu, FAN Hui, LIU Jing-lei. Robust clustering based on adaptive graph regularization and low-rank matrix decomposition [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(8): 21-38.
[8] LIU Li-fang, MA Yuan-yuan. Cross-modal information retrieval method based on multi-view symmetric nonnegative matrix factorization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 65-72.
[9] SUN Lin, LIANG Na, XU Jiu-cheng. Feature selection using adaptive neighborhood mutual information and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 13-24.
[10] WU Qi-ran, ZHOU Li-kai, SUN Jin-jin, WANG Nian-ge, YU Qun-fang. Research on characteristics of air quality change in Zhejiang Province——based on functional data analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(7): 53-64.
[11] YANG Ting, ZHU Heng-dong, MA Ying-cang, WANG Yi-rui, YANG Xiao-fei. Semi-supervised spectral clustering algorithm based on L2,1 norm and manifold regularization terms [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(3): 67-76.
[12] TANG Yi-ming, ZHANG Zheng, LU Qi-ming. Gaussian kernel fuzzy C-means clustering driven by piecewise quadratic transfer function [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(3): 107-112.
[13] Liu-ying WEN,Wei YUAN. Clustering method for multi-label symbolic value partition [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(3): 58-69.
[14] Zheng-yu LU,Guang-song LI,Ying-zhu SHEN,Bin ZHANG. Unknown protocol message clustering algorithm based on continuous features [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(5): 37-43.
[15] CUI Zhao-yang, SUN Jia-qi, XU Song-yan, JIANG Xin. A secure clustering algorithm of Ad Hoc network for colony UAVs [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(7): 51-59.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!