基于自适应图调节和低秩矩阵分解的鲁棒聚类

doi:10.6040/j.issn.1671-9352.7.2021.069

摘要/Abstract

摘要： 聚类是数据挖掘和机器学习领域的重要研究内容,一般会先基于数据样本构建相似图,再基于相似图将样本划分到相应的类中。但是真实的数据经常被损坏,导致学习的相似图不准确,从而直接影响聚类结果。为解决这些问题,提出一种面向鲁棒聚类的自适应图调节和低秩矩阵分解的方法,该方法的核心思想是:将原始数据X分解为纯净数据D和噪声数据S,再基于纯净数据构造拉普拉斯矩阵并进行自适应图调节。随后,给出一个联合学习框架,将数据分离、自适应图正则、噪声消除和低秩矩阵分解集成到一个目标函数中。利用增广拉格朗日乘子法分别更新变量。最后,在理论上证明算法的收敛性并进行实验。实验结果表明所提出的方法与现有一些方法相比有一定优越性。

关键词: 鲁棒聚类, L_2,1范数, 低秩矩阵分解, 自适应图调节, 增广拉格朗日乘子法

Abstract: Clustering is an important research content in the field of data mining and machine learning. Generally, similarity graphs are constructed based on data samples, and then the samples are divided into corresponding classes based on the similarity graphs. However, the real data is often damaged, resulting in inaccurate similarity graphs, which directly affects the clustering results. In order to solve these problems, a robust clustering-oriented adaptive graph adjustment and low-rank matrix decomposition method is proposed. The core idea of the method is to decompose the original data X into pure data D and noisy data S, then construct a Laplacian matrix based on pure data and perform adaptive graph adjustment. Subsequently, a joint learning framework is given, which integrates data separation, adaptive graph regularization, noise removal and low-rank matrix decomposition into an objective function. Use augmented Lagrangian multiplier method to update variables separately. Finally, the paper theoretically proves the convergence of the algorithm and conduct experiments. The experimental results show that the proposed method has certain advantages compared with some existing methods.

Key words: robust clustering, L_2,1 norm, low-rank matrix decomposition, adaptive graph regularization, augmented Lagrange multiplier method

中图分类号:

TP18

李心雨,范辉,刘惊雷. 基于自适应图调节和低秩矩阵分解的鲁棒聚类[J]. 《山东大学学报(理学版)》, 2022, 57(8): 21-38.

LI Xin-yu, FAN Hui, LIU Jing-lei. Robust clustering based on adaptive graph regularization and low-rank matrix decomposition[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(8): 21-38.

参考文献

[1] HUANG Shudong, KANG Zhao, TSANG I. Auto-weighted multiview clustering via kernelized graph learning[J]. Pattern Recognition, 2019, 88(1):174-184.
[2] CHEN Mulin, WANG Qi, LI Xuelong. Adaptive projected matrix factorization method for data clustering[J]. Neuro Computing, 2018, 306(1):182-188.
[3] KANG Zhao, PENG Chong, CHENG Qiang. Clustering with adaptive manifold structure learning[C] //2017 IEEE 33rd International Conference on Data Engineering(ICDE). San Diego: IEEE, 2017: 79-82.
[4] HUANG Shudong, REN Yazhou, XU Zenglin. Robust multiview data clustering with multi-view capped-norm k-means[J]. Neurocomputing, 2018, 311(1):197-208.
[5] HUANG Shudong, KANG Zhao, XU Zenglin. Self-weighted multiview clustering with soft capped norm[J]. Knowledge-Based Systems, 2018, 158(1):1-8.
[6] ALVAREZ-MEZA A M, LEE J A, VERLEYSEN M. Kernel-based dimensionality reduction using Renyis α-entropy measures of similarity[J]. Neurocomputing, 2017, 222(26):36-46.
[7] HAN I K, LUO Z, HUANG J Z. Variable weighting in fuzzy k-means clustering to determine the number of clusters[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(9):1838-1853.
[8] BICKEL P, CHOI D, CHANG X. Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels[J]. Annals of Statistics, 2012, 41(4):1922-1943.
[9] DING C H Q, LI T, JORDAN M I. Convex and semi-nonnegative matrix factorizations[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2010, 32(1):45-55.
[10] SINHA K. K-means clustering using random matrix sparsification[C] //DY J, KRAUSE A. Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 4684-4692.
[11] YE Xulun, ZHAO Jieyu, CHEN Yu, et al. Bayesian adversarial spectral clustering with unknown cluster number[J]. IEEE Transactions on Image Processing, 2020, 29(9):8506-8518.
[12] LONG Mingsheng, WANG Jianmin, DING Guiguang. Adaptation regularization: a general framework for transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(5):1076-1089.
[13] HUANG Shudong, ZHAO Peng, REN Yazhou, et al. Self-paced and soft-weighted nonnegative matrix factorization for data representation[J]. Knowledge-Based Systems, 2019, 164(1):29-37.
[14] YANG Yang, SHEN Fumin, HUANG Zi, et al. Discrete nonnegative spectral clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(9):1834-1845.
[15] KANG Zhao, PAN Haiqi, HOI S. Robust graph learning from noisy data[J]. IEEE Transactions on Cybernetics, 2020, 50(5):1833-1843.
[16] NIE Feiping, WANG Zheng, WANG Rong. Towards robust discriminative projections learning via non-greedy l_2,1-norm MinMax[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(6):2086-2100.
[17] MA Xiaoke, SUN Penggang, WANG Yu. Graph regularized nonnegative matrix factorization for temporal link prediction in dynamic networks[J]. Physica A: Statistical Mechanics and its Applications, 2018, 496(1):121-136.
[18] ZHANG Lefei, ZHANG Qian, DU Bo, et al. Adaptive manifold regularized matrix factorization for data clustering[C] //Twenty-sixth International Joint Conference on Artificial Intelligence. Melbourne: IJCAI, 2017: 3399-3405.
[19] LIU Xinwang, WANG Lei, ZHANG Jian, et al. Global and local structure preservation for feature selection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 25(6):1083-1095.
[20] HE Xiang, WANG Qi, LI Xuelong. Robust adaptive graph regularized nonnegative matrix factorization[J]. IEEE Access, 2019, 7(9):83101-83110.
[21] ZHAO Yongqiang, YANG Jingxiang. Hyperspectral image denoising via sparse representation and low-rank constraint[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(1):296-308.
[22] LIN Baihong, TAO Xiaoming, XU Mai, et al. Bayesian hyperspectral and multispectral image fusions via double matrix factorization[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(10):5666-5678.
[23] WAN Minghua, LAI Zhihui, MING Zhong, et al. An improve face representation and recognition method based on graph regularized non-negative matrix factorization[J]. Multimedia Tools and Applications, 2019, 78(15):22109-22126.
[24] ZHANG Z, HAWKINS C. Variational Bayesian inference for robust streaming tensor factorization and completion[C] //2018 IEEE International Conference on Data Mining. Singapore: IEEE, 2018: 1446-1451.
[25] WU Siqi, YU Bin. Local identifiability of L₁-minimization dictionary learning: a sufficient and almost necessary condition[J]. Journal of Machine Learning Research, 2018, 18(168):1-56.
[26] ZHANG Kai, ZHANG Sheng, LIU Jun, et al. Greedy orthogonal pivoting algorithm for non-negative matrix factorization[C] //CHAUDHURI K, SALAKHUTDINOV R.Proceedings of the 36 th International Conference on Machine Learning. Long Beach: ICML, 2019: 7493-7501.
[27] KAILKHURA B, THIAGARAJAN J J, RASTOGI C, et al. A spectral approach for the design of experiments: design, analysis and algorithms[J]. Journal of Machine Learning Research, 2018, 19(34):1-46.
[28] LEI J, ROGERS R J. Combined Lagrangian multiplier and penalty function finite element technique for elastic impact analysis[J]. Computers & Structures,1988, 30(6):1219-1229.
[29] TOH K C, YUN S. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems[J]. Pacific Journal of Optimization, 2010, 6(3):615-640.
[30] YANG Junfeng. Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization[J]. Mathematics of Computation, 2012, 82(281):301-329.
[31] HUANG Jin, NIE Feiping, HUANG Heng, et al. Robust manifold non-negative matrix factorization[J]. ACM Transactions on Knowledge Discovery from Data, 2014, 8(3):11.
[32] 史加荣,周水生,郑秀云. 多线性鲁棒主成分分析[J]. 电子学报,2014,42(8):1480-1486. SHI Jiarong, ZHOU Shuisheng, ZHEN Xiuyun. Multilinear robust principal component analysis[J]. Acta Electronica Sinica, 2014, 42(8):1480-1486.
[33] HARTIGAN J, WONG M A. K-means clustering algorithm[J]. Applied Statistics, 2013, 28(1):100-108.
[34] XU H, CARAMANIS C, SANGHAVI S. Robust PCA via outlier pursuit[J]. IEEE Transactions on Information Theory, 2012, 58(5):3047-3064.
[35] LEE D D, SEUNG H S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401(6755):788-791.
[36] CAI Deng, HE Xiaofei, HAN Jiawei, et al. Graph regularized non-negative matrix factorization for data representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8):1548-1560.
[37] HUANG Jin, NIE Feiping, HUANG Heng, et al. Robust manifold non-negative matrix factorization[J]. ACM Transactions on Knowledge Discovery from Data, 2014, 8(3):1-21.
[38] PENG Chong, KANG Zhao, HU Yunhong. Robust graph regularized nonnegative matrix factorization for clustering[J]. ACM Transactions on Knowledge Discovery from Data, 2017, 11(3):1-30.
[39] YIN Ming, XIE Shengli, WU Zongze, et al. Subspace clustering via learning an adaptive low-rank graph[J]. IEEE Transactions on Image Processing, 2018, 27(8):3716-3728.
[40] HE Fang, NIE Feiping, WANG Rong. Fast semisupervised learning with bipartite graph for large-scale data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(2):626-638.
[41] LI Xuelong, CHEN Mulin, WANG Qi. Adaptive consistency propagation nethod for graph clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(4):797-802.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed