您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2022, Vol. 57 ›› Issue (8): 21-38.doi: 10.6040/j.issn.1671-9352.7.2021.069

• • 上一篇    下一篇

基于自适应图调节和低秩矩阵分解的鲁棒聚类

李心雨1,2,范辉1,2*,刘惊雷3   

  1. 1.山东工商学院计算机科学与技术学院, 山东 烟台 264005;2.山东省高等学校未来智能计算协同创新中心, 山东 烟台 264005;3.烟台大学计算机与控制工程学院, 山东 烟台 264005
  • 出版日期:2022-08-20 发布日期:2022-06-29
  • 作者简介:李心雨(1997— ),女,硕士研究生,研究方向为自适应图学习. E-mail:1471415972@qq.com*通信作者简介:范辉(1962— ),男,博士,教授,研究方向为计算机辅助几何设计、计算机图形学、图像处理. E-mail:fanlinw@263.net
  • 基金资助:
    国家自然科学基金资助项目(62002200,61772319);山东省自然科学基金资助项目(ZR2020QF012)

Robust clustering based on adaptive graph regularization and low-rank matrix decomposition

LI Xin-yu1,2, FAN Hui1,2 *, LIU Jing-lei3   

  1. 1. School of Computer Science and Technology, Shandong Technology and Business University, Yantai 264005, Shandong, China;
    2. Co-innovation Center of Shandong Colleges and Universities Future Intelligent Computing, Yantai 264005, Shandong, China;
    3. School of Computer and Control Engineering, Yantai University, Yantai 264005, Shandong, China
  • Online:2022-08-20 Published:2022-06-29

摘要: 聚类是数据挖掘和机器学习领域的重要研究内容,一般会先基于数据样本构建相似图,再基于相似图将样本划分到相应的类中。但是真实的数据经常被损坏,导致学习的相似图不准确,从而直接影响聚类结果。为解决这些问题,提出一种面向鲁棒聚类的自适应图调节和低秩矩阵分解的方法,该方法的核心思想是:将原始数据X分解为纯净数据D和噪声数据S,再基于纯净数据构造拉普拉斯矩阵并进行自适应图调节。随后,给出一个联合学习框架,将数据分离、自适应图正则、噪声消除和低秩矩阵分解集成到一个目标函数中。利用增广拉格朗日乘子法分别更新变量。最后,在理论上证明算法的收敛性并进行实验。实验结果表明所提出的方法与现有一些方法相比有一定优越性。

关键词: 鲁棒聚类, L2,1范数, 低秩矩阵分解, 自适应图调节, 增广拉格朗日乘子法

Abstract: Clustering is an important research content in the field of data mining and machine learning. Generally, similarity graphs are constructed based on data samples, and then the samples are divided into corresponding classes based on the similarity graphs. However, the real data is often damaged, resulting in inaccurate similarity graphs, which directly affects the clustering results. In order to solve these problems, a robust clustering-oriented adaptive graph adjustment and low-rank matrix decomposition method is proposed. The core idea of the method is to decompose the original data X into pure data D and noisy data S, then construct a Laplacian matrix based on pure data and perform adaptive graph adjustment. Subsequently, a joint learning framework is given, which integrates data separation, adaptive graph regularization, noise removal and low-rank matrix decomposition into an objective function. Use augmented Lagrangian multiplier method to update variables separately. Finally, the paper theoretically proves the convergence of the algorithm and conduct experiments. The experimental results show that the proposed method has certain advantages compared with some existing methods.

Key words: robust clustering, L2,1 norm, low-rank matrix decomposition, adaptive graph regularization, augmented Lagrange multiplier method

中图分类号: 

  • TP18
[1] HUANG Shudong, KANG Zhao, TSANG I. Auto-weighted multiview clustering via kernelized graph learning[J]. Pattern Recognition, 2019, 88(1):174-184.
[2] CHEN Mulin, WANG Qi, LI Xuelong. Adaptive projected matrix factorization method for data clustering[J]. Neuro Computing, 2018, 306(1):182-188.
[3] KANG Zhao, PENG Chong, CHENG Qiang. Clustering with adaptive manifold structure learning[C] //2017 IEEE 33rd International Conference on Data Engineering(ICDE). San Diego: IEEE, 2017: 79-82.
[4] HUANG Shudong, REN Yazhou, XU Zenglin. Robust multiview data clustering with multi-view capped-norm k-means[J]. Neurocomputing, 2018, 311(1):197-208.
[5] HUANG Shudong, KANG Zhao, XU Zenglin. Self-weighted multiview clustering with soft capped norm[J]. Knowledge-Based Systems, 2018, 158(1):1-8.
[6] ALVAREZ-MEZA A M, LEE J A, VERLEYSEN M. Kernel-based dimensionality reduction using Renyis α-entropy measures of similarity[J]. Neurocomputing, 2017, 222(26):36-46.
[7] HAN I K, LUO Z, HUANG J Z. Variable weighting in fuzzy k-means clustering to determine the number of clusters[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(9):1838-1853.
[8] BICKEL P, CHOI D, CHANG X. Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels[J]. Annals of Statistics, 2012, 41(4):1922-1943.
[9] DING C H Q, LI T, JORDAN M I. Convex and semi-nonnegative matrix factorizations[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2010, 32(1):45-55.
[10] SINHA K. K-means clustering using random matrix sparsification[C] //DY J, KRAUSE A. Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 4684-4692.
[11] YE Xulun, ZHAO Jieyu, CHEN Yu, et al. Bayesian adversarial spectral clustering with unknown cluster number[J]. IEEE Transactions on Image Processing, 2020, 29(9):8506-8518.
[12] LONG Mingsheng, WANG Jianmin, DING Guiguang. Adaptation regularization: a general framework for transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(5):1076-1089.
[13] HUANG Shudong, ZHAO Peng, REN Yazhou, et al. Self-paced and soft-weighted nonnegative matrix factorization for data representation[J]. Knowledge-Based Systems, 2019, 164(1):29-37.
[14] YANG Yang, SHEN Fumin, HUANG Zi, et al. Discrete nonnegative spectral clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(9):1834-1845.
[15] KANG Zhao, PAN Haiqi, HOI S. Robust graph learning from noisy data[J]. IEEE Transactions on Cybernetics, 2020, 50(5):1833-1843.
[16] NIE Feiping, WANG Zheng, WANG Rong. Towards robust discriminative projections learning via non-greedy l2,1-norm MinMax[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(6):2086-2100.
[17] MA Xiaoke, SUN Penggang, WANG Yu. Graph regularized nonnegative matrix factorization for temporal link prediction in dynamic networks[J]. Physica A: Statistical Mechanics and its Applications, 2018, 496(1):121-136.
[18] ZHANG Lefei, ZHANG Qian, DU Bo, et al. Adaptive manifold regularized matrix factorization for data clustering[C] //Twenty-sixth International Joint Conference on Artificial Intelligence. Melbourne: IJCAI, 2017: 3399-3405.
[19] LIU Xinwang, WANG Lei, ZHANG Jian, et al. Global and local structure preservation for feature selection[J]. IEEE Transactions on Neural Networks and Learning Systems, 2017, 25(6):1083-1095.
[20] HE Xiang, WANG Qi, LI Xuelong. Robust adaptive graph regularized nonnegative matrix factorization[J]. IEEE Access, 2019, 7(9):83101-83110.
[21] ZHAO Yongqiang, YANG Jingxiang. Hyperspectral image denoising via sparse representation and low-rank constraint[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(1):296-308.
[22] LIN Baihong, TAO Xiaoming, XU Mai, et al. Bayesian hyperspectral and multispectral image fusions via double matrix factorization[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(10):5666-5678.
[23] WAN Minghua, LAI Zhihui, MING Zhong, et al. An improve face representation and recognition method based on graph regularized non-negative matrix factorization[J]. Multimedia Tools and Applications, 2019, 78(15):22109-22126.
[24] ZHANG Z, HAWKINS C. Variational Bayesian inference for robust streaming tensor factorization and completion[C] //2018 IEEE International Conference on Data Mining. Singapore: IEEE, 2018: 1446-1451.
[25] WU Siqi, YU Bin. Local identifiability of L1-minimization dictionary learning: a sufficient and almost necessary condition[J]. Journal of Machine Learning Research, 2018, 18(168):1-56.
[26] ZHANG Kai, ZHANG Sheng, LIU Jun, et al. Greedy orthogonal pivoting algorithm for non-negative matrix factorization[C] //CHAUDHURI K, SALAKHUTDINOV R.Proceedings of the 36 th International Conference on Machine Learning. Long Beach: ICML, 2019: 7493-7501.
[27] KAILKHURA B, THIAGARAJAN J J, RASTOGI C, et al. A spectral approach for the design of experiments: design, analysis and algorithms[J]. Journal of Machine Learning Research, 2018, 19(34):1-46.
[28] LEI J, ROGERS R J. Combined Lagrangian multiplier and penalty function finite element technique for elastic impact analysis[J]. Computers & Structures,1988, 30(6):1219-1229.
[29] TOH K C, YUN S. An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems[J]. Pacific Journal of Optimization, 2010, 6(3):615-640.
[30] YANG Junfeng. Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization[J]. Mathematics of Computation, 2012, 82(281):301-329.
[31] HUANG Jin, NIE Feiping, HUANG Heng, et al. Robust manifold non-negative matrix factorization[J]. ACM Transactions on Knowledge Discovery from Data, 2014, 8(3):11.
[32] 史加荣,周水生,郑秀云. 多线性鲁棒主成分分析[J]. 电子学报,2014,42(8):1480-1486. SHI Jiarong, ZHOU Shuisheng, ZHEN Xiuyun. Multilinear robust principal component analysis[J]. Acta Electronica Sinica, 2014, 42(8):1480-1486.
[33] HARTIGAN J, WONG M A. K-means clustering algorithm[J]. Applied Statistics, 2013, 28(1):100-108.
[34] XU H, CARAMANIS C, SANGHAVI S. Robust PCA via outlier pursuit[J]. IEEE Transactions on Information Theory, 2012, 58(5):3047-3064.
[35] LEE D D, SEUNG H S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401(6755):788-791.
[36] CAI Deng, HE Xiaofei, HAN Jiawei, et al. Graph regularized non-negative matrix factorization for data representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(8):1548-1560.
[37] HUANG Jin, NIE Feiping, HUANG Heng, et al. Robust manifold non-negative matrix factorization[J]. ACM Transactions on Knowledge Discovery from Data, 2014, 8(3):1-21.
[38] PENG Chong, KANG Zhao, HU Yunhong. Robust graph regularized nonnegative matrix factorization for clustering[J]. ACM Transactions on Knowledge Discovery from Data, 2017, 11(3):1-30.
[39] YIN Ming, XIE Shengli, WU Zongze, et al. Subspace clustering via learning an adaptive low-rank graph[J]. IEEE Transactions on Image Processing, 2018, 27(8):3716-3728.
[40] HE Fang, NIE Feiping, WANG Rong. Fast semisupervised learning with bipartite graph for large-scale data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(2):626-638.
[41] LI Xuelong, CHEN Mulin, WANG Qi. Adaptive consistency propagation nethod for graph clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(4):797-802.
[1] 及歆荣,侯翠琴,侯义斌,赵斌. 基于筛选机制的L1核学习机分布式训练方法[J]. 山东大学学报(理学版), 2016, 51(9): 137-144.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 杨军. 金属基纳米材料表征和纳米结构调控[J]. 山东大学学报(理学版), 2013, 48(1): 1 -22 .
[2] 何海伦, 陈秀兰*. 变性剂和缓冲系统对适冷蛋白酶MCP-01和中温蛋白酶BP-01构象影响的圆二色光谱分析何海伦, 陈秀兰*[J]. 山东大学学报(理学版), 2013, 48(1): 23 -29 .
[3] 赵君1,赵晶2,樊廷俊1*,袁文鹏1,3,张铮1,丛日山1. 水溶性海星皂苷的分离纯化及其抗肿瘤活性研究[J]. J4, 2013, 48(1): 30 -35 .
[4] 孙小婷1,靳岚2*. DOSY在寡糖混合物分析中的应用[J]. J4, 2013, 48(1): 43 -45 .
[5] 罗斯特,卢丽倩,崔若飞,周伟伟,李增勇*. Monte-Carlo仿真酒精特征波长光子在皮肤中的传输规律及光纤探头设计[J]. J4, 2013, 48(1): 46 -50 .
[6] 杨伦,徐正刚,王慧*,陈其美,陈伟,胡艳霞,石元,祝洪磊,曾勇庆*. RNA干扰沉默PID1基因在C2C12细胞中表达的研究[J]. J4, 2013, 48(1): 36 -42 .
[7] 冒爱琴1, 2, 杨明君2, 3, 俞海云2, 张品1, 潘仁明1*. 五氟乙烷灭火剂高温热解机理研究[J]. J4, 2013, 48(1): 51 -55 .
[8] 杨莹,江龙*,索新丽. 容度空间上保费泛函的Choquet积分表示及相关性质[J]. J4, 2013, 48(1): 78 -82 .
[9] 李永明1, 丁立旺2. PA误差下半参数回归模型估计的r-阶矩相合[J]. J4, 2013, 48(1): 83 -88 .
[10] 董伟伟. 一种具有独立子系统的决策单元DEA排序新方法[J]. J4, 2013, 48(1): 89 -92 .