结合流形结构与柔性嵌入的多标签特征选择

doi:10.6040/j.issn.1671-9352.0.2020.588

摘要/Abstract

摘要： 将线性回归模型与流形结构相结合,构成了弱线性多标签特征选择的联合框架。首先,用最小二乘损失函数来学习回归系数矩阵;其次,通过标签流形结构来学习数据特征的权重矩阵;再次,用L2,1-范数来约束回归系数矩阵和特征权重矩阵,这样既能引导稀疏性,又有利于特征选择。此外,设计并证明了具有收敛性的迭代更新算法来解决上述提出的问题。最后,所提出的方法在多个经典多标签数据集上进行了验证,实验结果表明了所提算法的有效性。

关键词: 多标签学习, 特征选择, logistic回归, L_2,1-范数, 流形结构

Abstract: The linear regression model and manifold structure are combined to form a weak linear multi-label feature selection framework. Firstly, the least square loss function is used to learn the regression coefficient matrix; secondly, the label manifold structure is used to learn the weight matrix of data features; thirdly, L_2,1-norm is used to constrain the regression coefficient matrix and feature weight matrix, which can guide the sparsity and facilitate feature selection. In addition, an iterative updating algorithm with convergence is designed and proved to solve the above problems. Finally, the proposed method is verified on several classical multi-label datasets, and the experimental results show the effectiveness of the proposed algorithm.

Key words: multi-label learning, feature selection, logistic regression, L_2,1-norm, manifold structure

中图分类号:

TP181

张要,马盈仓,杨小飞,朱恒东,杨婷. 结合流形结构与柔性嵌入的多标签特征选择[J]. 《山东大学学报(理学版)》, 2021, 56(7): 91-102.

ZHANG Yao, MA Ying-cang, YAND Xiao-fei, ZHU Heng-dong, YANG Ting. Multi-label feature selection based on manifold structure and flexible embedding[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(7): 91-102.

参考文献

[1] CAI Jie, LUO Jiawei, WANG Shulin, et al. Feature selection in machine learning: a new perspective[J]. Neurocomputing, 2018, 300:70-79.
[2] BERMINGHAM M L, PONG-WONG R, SPILIOPOULOU A, et al. Application of high-dimensional feature selection: evaluation for genomic prediction in man[J]. Scientific Reports, 2015, 5:10312.
[3] HASTIE T, TIBSHIRANI R, FRIEDMAN J. The elements of statistical learning: data mining, inference, and prediction[J]. The Mathematical Intelligencer, 2005, 27(2):83-85.
[4] SUN Xin, LIU Yanheng, LI Jin, et al. Using cooperative game theory to optimize the feature selection problem[J]. Neurocomputing, 2012, 97:86-93.
[5] ZHANG Rui, NIE Feiping, LI Xuelong, et al. Feature selection with multi-view data: a survey[J]. Information Fusion, 2019(50):158-167.
[6] DING Chuancang, ZHAO Ming, LIN Jing, et al. Multi-objective iterative optimization algorithm based optimal wavelet filter selection for multi-fault diagnosis of rolling element bearings[J]. ISA Trans, 2019, 82:199-215.
[7] LABANI M, MORADI P, AHMADIZAR F, et al. A novel multivariate filter method for feature selection in text classification problems[J]. Engineering Applications of Artificial Intelligence, 2018, 70:25-37.
[8] YAO Chao, LIU Yafeng, JIANG Bo, et al. LLE score: a new filter-based unsupervised feature selection method based on nonlinear manifold embedding and its application to image recognition[J]. IEEE Transactions on Image Processing, 2017, 26(11):5257-5269.
[9] GONZALEZ J, ORTEGA J, DAMAS M, et al. A new multi-objective wrapper method for feature selection: Accuracy and stability analysis for BCI[J]. Neurocomputing, 2019, 333:407-418.
[10] SWATI J, HONGMEI H, KARL J. Information gain directed genetic algorithm wrapper feature selection for credit rating[J]. Appl Soft Comput, 2018(69):541-553.
[11] MALDONADO S, LÓPEZ J. Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification[J]. Appl Soft Comput, 2018(67):94-105.
[12] KONG Yunchuan, YU Tianwei. A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data[J]. Bioinformatics, 2018, 34(21):3727-3737.
[13] SUN Zhenqiang,ZHANG Jia, DAI Liang, et al. Mutual information based multi-label feature selection via constrained convex optimization[J]. Neurocomputing, 2019, 329:447-456.
[14] ZHANG Ping, LIU Guixia, GAO Wanfu. Distinguishing two types of labels for multilabel feature selection[J]. Pattern Recognition. 2019, 95:72-82.
[15] CHEN Linlin, CHEN Degang. Alignment based feature selection for multi-label learning[J]. Neural Processing Letters. 2019, 50(2/3):28-36.
[16] CHEN Sibao, ZHANG Yumei, CHRIS H Q, et al. Extended adaptive Lasso for multi-class and multi-label feature selection[J]. Knowledge-Based Systems, 2019, 173:28-36.
[17] ZHANG Jia, LUO Zhiming, LI Candong, et al. Manifold regularized discriminative feature selection for multi-label learning[J]. Pattern Recognition. 2019, 95(1):136-150.
[18] 蔡志铃, 祝峰. 非负稀疏表示的多标签特征选择[J]. 计算机科学与探索, 2017, 11(7):1175-1182. CAI Zhiling, ZHU William. Multi-label feature selection via non-negative sparse representation[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(7):1175-1182.
[19] GU Quanquan, ZHOU Jie. Co-clustering on manifolds [C] //ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 359-368.
[20] NIE Feiping, HUANG Heng, CAI Xiao, et al. Efficient and robust feature selection via joint L21-norms minimization [C] //Proceedings of the 23rd International Conference on Neural Information Processing Systems-Volume 2. New York: Curran Associates Inc, 2010: 1813-1821.
[21] ZHANG Minling, ZHOU Zhihua. ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognit, 2007, 40(7):2038-2048.
[22] LEE J, KIM D W. SCLS: multi-label feature selection based on scalable criterion for large label set[J]. Pattern Recognition. 2017, 66(1):342-352.
[23] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on max-dependency and min-redundancy[J]. Neurocomputing. 2015, 168(30):92-103.
[24] LEE J, KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognit Lett, 2013, 34(3):349-357
[25] LEE J, KIM D W. Fast multi-label feature selection based on information-theoretic feature ranking[J]. Pattern Recognition, 2015, 48(9):2761-2771.
[26] DOUGHERTY J, KOHAVI R, SAHAMI M, et al. Supervised and unsupervised discretization of continuous features[C] //Proceedings of the Twelfth International Conference on Machine Learning. Tahoe: Elsevier Inc, 1995: 194-202.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed