您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2021, Vol. 56 ›› Issue (7): 91-102.doi: 10.6040/j.issn.1671-9352.0.2020.588

• • 上一篇    

结合流形结构与柔性嵌入的多标签特征选择

张要,马盈仓*,杨小飞,朱恒东,杨婷   

  1. 西安工程大学理学院, 陕西 西安 710600
  • 发布日期:2021-07-19
  • 作者简介:张要(1994— ),男,硕士研究生,研究方向为机器学习与聚类. E-mail:ayunxiaobao@163.com*通信作者简介:马盈仓(1972— ),男,博士,教授,研究方向为人工智能与机器学习. E-mail:mayingcang@126.com
  • 基金资助:
    国家自然科学基金资助项目(61976130);陕西省自然科学基金资助项目(2020JQ-923);陕西省重点研发计划资助项目(2018KW-021)

Multi-label feature selection based on manifold structure and flexible embedding

ZHANG Yao, MA Ying-cang*, YAND Xiao-fei, ZHU Heng-dong, YANG Ting   

  1. School of Science, Xian Polytechnic University, Xian 710600, Shaanxi, China
  • Published:2021-07-19

摘要: 将线性回归模型与流形结构相结合,构成了弱线性多标签特征选择的联合框架。首先,用最小二乘损失函数来学习回归系数矩阵;其次,通过标签流形结构来学习数据特征的权重矩阵;再次,用L2,1-范数来约束回归系数矩阵和特征权重矩阵,这样既能引导稀疏性,又有利于特征选择。此外,设计并证明了具有收敛性的迭代更新算法来解决上述提出的问题。最后,所提出的方法在多个经典多标签数据集上进行了验证,实验结果表明了所提算法的有效性。

关键词: 多标签学习, 特征选择, logistic回归, L2,1-范数, 流形结构

Abstract: The linear regression model and manifold structure are combined to form a weak linear multi-label feature selection framework. Firstly, the least square loss function is used to learn the regression coefficient matrix; secondly, the label manifold structure is used to learn the weight matrix of data features; thirdly, L2,1-norm is used to constrain the regression coefficient matrix and feature weight matrix, which can guide the sparsity and facilitate feature selection. In addition, an iterative updating algorithm with convergence is designed and proved to solve the above problems. Finally, the proposed method is verified on several classical multi-label datasets, and the experimental results show the effectiveness of the proposed algorithm.

Key words: multi-label learning, feature selection, logistic regression, L2,1-norm, manifold structure

中图分类号: 

  • TP181
[1] CAI Jie, LUO Jiawei, WANG Shulin, et al. Feature selection in machine learning: a new perspective[J]. Neurocomputing, 2018, 300:70-79.
[2] BERMINGHAM M L, PONG-WONG R, SPILIOPOULOU A, et al. Application of high-dimensional feature selection: evaluation for genomic prediction in man[J]. Scientific Reports, 2015, 5:10312.
[3] HASTIE T, TIBSHIRANI R, FRIEDMAN J. The elements of statistical learning: data mining, inference, and prediction[J]. The Mathematical Intelligencer, 2005, 27(2):83-85.
[4] SUN Xin, LIU Yanheng, LI Jin, et al. Using cooperative game theory to optimize the feature selection problem[J]. Neurocomputing, 2012, 97:86-93.
[5] ZHANG Rui, NIE Feiping, LI Xuelong, et al. Feature selection with multi-view data: a survey[J]. Information Fusion, 2019(50):158-167.
[6] DING Chuancang, ZHAO Ming, LIN Jing, et al. Multi-objective iterative optimization algorithm based optimal wavelet filter selection for multi-fault diagnosis of rolling element bearings[J]. ISA Trans, 2019, 82:199-215.
[7] LABANI M, MORADI P, AHMADIZAR F, et al. A novel multivariate filter method for feature selection in text classification problems[J]. Engineering Applications of Artificial Intelligence, 2018, 70:25-37.
[8] YAO Chao, LIU Yafeng, JIANG Bo, et al. LLE score: a new filter-based unsupervised feature selection method based on nonlinear manifold embedding and its application to image recognition[J]. IEEE Transactions on Image Processing, 2017, 26(11):5257-5269.
[9] GONZALEZ J, ORTEGA J, DAMAS M, et al. A new multi-objective wrapper method for feature selection: Accuracy and stability analysis for BCI[J]. Neurocomputing, 2019, 333:407-418.
[10] SWATI J, HONGMEI H, KARL J. Information gain directed genetic algorithm wrapper feature selection for credit rating[J]. Appl Soft Comput, 2018(69):541-553.
[11] MALDONADO S, LÓPEZ J. Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification[J]. Appl Soft Comput, 2018(67):94-105.
[12] KONG Yunchuan, YU Tianwei. A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data[J]. Bioinformatics, 2018, 34(21):3727-3737.
[13] SUN Zhenqiang,ZHANG Jia, DAI Liang, et al. Mutual information based multi-label feature selection via constrained convex optimization[J]. Neurocomputing, 2019, 329:447-456.
[14] ZHANG Ping, LIU Guixia, GAO Wanfu. Distinguishing two types of labels for multilabel feature selection[J]. Pattern Recognition. 2019, 95:72-82.
[15] CHEN Linlin, CHEN Degang. Alignment based feature selection for multi-label learning[J]. Neural Processing Letters. 2019, 50(2/3):28-36.
[16] CHEN Sibao, ZHANG Yumei, CHRIS H Q, et al. Extended adaptive Lasso for multi-class and multi-label feature selection[J]. Knowledge-Based Systems, 2019, 173:28-36.
[17] ZHANG Jia, LUO Zhiming, LI Candong, et al. Manifold regularized discriminative feature selection for multi-label learning[J]. Pattern Recognition. 2019, 95(1):136-150.
[18] 蔡志铃, 祝峰. 非负稀疏表示的多标签特征选择[J]. 计算机科学与探索, 2017, 11(7):1175-1182. CAI Zhiling, ZHU William. Multi-label feature selection via non-negative sparse representation[J]. Journal of Frontiers of Computer Science and Technology, 2017, 11(7):1175-1182.
[19] GU Quanquan, ZHOU Jie. Co-clustering on manifolds [C] //ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 359-368.
[20] NIE Feiping, HUANG Heng, CAI Xiao, et al. Efficient and robust feature selection via joint L21-norms minimization [C] //Proceedings of the 23rd International Conference on Neural Information Processing Systems-Volume 2. New York: Curran Associates Inc, 2010: 1813-1821.
[21] ZHANG Minling, ZHOU Zhihua. ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognit, 2007, 40(7):2038-2048.
[22] LEE J, KIM D W. SCLS: multi-label feature selection based on scalable criterion for large label set[J]. Pattern Recognition. 2017, 66(1):342-352.
[23] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on max-dependency and min-redundancy[J]. Neurocomputing. 2015, 168(30):92-103.
[24] LEE J, KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognit Lett, 2013, 34(3):349-357
[25] LEE J, KIM D W. Fast multi-label feature selection based on information-theoretic feature ranking[J]. Pattern Recognition, 2015, 48(9):2761-2771.
[26] DOUGHERTY J, KOHAVI R, SAHAMI M, et al. Supervised and unsupervised discretization of continuous features[C] //Proceedings of the Twelfth International Conference on Machine Learning. Tahoe: Elsevier Inc, 1995: 194-202.
[1] 黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报(理学版), 2017, 52(3): 91-96.
[2] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[3] 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12.
[4] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[5] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[6] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[7] 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
[8] 杨玉珍 刘培玉 朱振方 邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51.
[9] 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 .
[10] 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!