您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (5): 70-81.doi: 10.6040/j.issn.1671-9352.7.2023.4523

• • 上一篇    下一篇

基于次相关特征和邻域互信息的在线多标记特征选择算法

程雨轩1,2,毛煜1,2*,张小清1,2,曾艺祥1,2,林耀进1,2   

  1. 1.闽南师范大学计算机学院, 福建 漳州 363000;2.数据科学与智能应用福建省高等学校重点实验室, 福建 漳州 363000
  • 发布日期:2024-05-09
  • 通讯作者: 毛煜(1985— ),男,讲师,硕士生导师,博士,研究方向为推荐系统与数据挖掘. E-mail:maoyu_bit@163.com
  • 基金资助:
    福建省自然科学基金资助项目(2022J01914)

Online multi-label feature selection based on sub-correlation features and neighborhood mutual information

CHENG Yuxuan1,2, MAO Yu1,2*, ZHANG Xiaoqing1,2, ZENG Yixiang1,2, LIN Yaojin1,2   

  1. 1. School of Computer Science, Minnan Normal University, Zhangzhou 363000, Fujian, China;
    2. Key Laboratory of Data Science and Intelligence Application, Minnan Normal University, Zhangzhou 363000, Fujian, China
  • Published:2024-05-09

摘要: 为了充分地挖掘被单一度量指标算法忽略但对分类结果有利的特征,提出了基于次相关特征和邻域互信息的在线多标记特征选择算法,通过计算得到的新到达特征的重要性以及相关度,分析其显著性的区别,将特征区分为显著特征以及次相关特征。利用邻域交互信息对新到达的特征与已选特征集合进行冗余性分析,剔除依赖度较低的特征,以此逐步提升特征子集的质量。构建了基于全局的线性和非线性关系的度量指标,并以此来计算特征的局部相关度,有效地挖掘次相关特征。充分考虑特征空间中次相关特征存在的问题,将次相关特征从特征集合中剥离并单独保存,使之在冗余分析阶段不会因显著特征对度量指标敏感度高所产生的影响而被剔除出特征集合。建立了特征选择指标,利用迭代策略根据指标进行特征选择。实验结果表明,该算法具有很好的有效性和稳定性。

关键词: 在线特征选择, 多标记学习, 邻域熵, 邻域互信息, 次相关特征

Abstract: To fully mine the features neglected by the single metric algorithm but beneficial to the classifier, this paper proposes an online multi-label feature selection algorithm based on sub-correlation features and neighborhood mutual information. By calculating the importance and correlation of newly arrived features, the difference between the significance of new features is analyzed, and the features are divided into salient features and sub-correlation features. Redundancy analysis is performed on newly arrived features and selected feature sets using neighborhood interaction information, and features with low dependencies are eliminated, to gradually improve the quality of feature subsets. This paper also constructs a measurement index based on the global linear and nonlinear relationship and uses it to calculate the local correlation of features, effectively mining the sub-correlation features. Strip the sub-correlation features from the feature set and save them separately, so that they will not be eliminated from the feature set during the redundancy analysis stage due to the high sensitivity of the salient features to the measurement index. Using established feature selection indicators and iterative strategies to select features according to the indicators. Experimental results show that the proposed algorithm has good effectiveness and stability.

Key words: online feature selection, multi-label learning, neighborhood entropy, neighborhood mutual information, sub-correlation feature

中图分类号: 

  • TP391
[1] ZHANG Minling, ZHANG Qianwen, FANG Junpeng, et al. Leveraging implicit relative labeling-importance information for effective multi-label learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(5):2057-2070.
[2] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. MULFE: multi-label learning via label-specific feature space ensemble[J]. Transactions on Knowledge Discovery from Data, 2022, 16(1):1-24.
[3] ZHANG Minling, ZHOU Zhihua. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):1819-1837.
[4] LIU Jinghua, LIN Yaojin, DING Weiping, et al. Fuzzy mutual information-based multi-label feature selection with label dependency and streaming labels[J]. IEEE Transactions on Fuzzy Systems, 2022, 31(1):77-91.
[5] WOLD S, TRYGG J, BERGLUND A, et al. Some recent developments in PLS modeling[J]. Chemometrics and Intelligent Laboratory Systems, 2001, 58(2):131-150.
[6] HOTELLING H. Relations between two sets of variates[M] //New York, USA: Springer, 1992:162-190.
[7] ZHANG Yin, ZHOU Zhihua. Multi-label dimensionality reduction via dependence maximization[J]. Transactions on Knowledge Discovery from Data, 2010, 4(3):1-41.
[8] 许行,张凯,王问剑. 一种小样本数据的特征选择方法[J]. 计算机研究与发展, 2018, 55(10):2321-2330. XU Xing, ZHANG Kai, WANG Wenjian. A feature selection method for small samples[J]. Journal of Computer Research and Development, 2018, 55(10):2321-2330.
[9] ZHANG Lingjun, HU Qinghua, DUAN Jie, et al. Multi-label feature selection with fuzzy rough sets[C] //Proceeding of International Conference on Rough Sets and Knowledge Technology. New York: Springer, Cham, 2014:121-128.
[10] WU Yilin, LIU Jinghua, LIN Yaojin, et al. Neighborhood rough set based multi-label feature selection with label correlation[J]. Concurrency and Computation Practice and Experience, 2022, 34(22):1-13.
[11] 曾艺祥, 林耀进, 李育林, 等. 基于抗噪声邻域粗糙集的在线流特征选择算法[J]. 小型微型计算机系统, 2023, 44(7):1494-1499. ZENG Yixiang, LIN Yaojin, LI Yulin, et al. Online streaming feature selection based on anti-noise neighborhood rough set[J]. Journal of Chinese Computer Systems, 2023, 44(7):1494-1499.
[12] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Streaming feature selection for multilabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6):1491-1507.
[13] LIU Jinghua, LIN Yaojin, LI Yuwen, et al. Online multi-label streaming feature selection based on neighborhood rough set[J]. Pattern Recognition, 2018, 84:273-287.
[14] YOU Dianlong, WANG Yang, XIAO Jiawei, et al. Online multi-label streaming feature selection with label correlation[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(3):2901-2915.
[15] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on neighborhood mutual information[J]. Applied Soft Computing, 2016, 38:244-256.
[16] KWAK N, CHOI C H. Input feature selection for classification problems[J]. IEEE Transactions on Neural Networks, 2002, 13(1):143-159.
[17] HASHEMI A, DOWLATSHAHI M B, NEZAMABADI-POUR H. MFS-MCDM: multi-label feature selection using multi-criteria decision making[J]. Knowledge-based Systems, 2020, 206:106365.
[18] PANIRI M, DOWLATSHAHI M B, NEZAMABADI-POUR H. MLACO: a multi-label feature selection algorithm based on ant colony optimization[J]. Knowledge-based Systems, 2020, 192:105285.
[19] HUANG Rui, WU Zhejun. Multi-label feature selection via manifold regularization and dependence maximization[J]. Pattern Recognition, 2021, 120:108149.
[20] FRIEDMAN M. A comparison of alternative tests of significance for the problem of m rankings[J]. The Annals of Mathematical Statistics, 1940, 11(1):86-92.
[21] DUNN O J. Multiple comparisons among means[J]. Journal of the American Statistical Association, 1961, 56(293):52-64.
[1] 张珊丹,翁伟,谢小竹,魏博文,王劲波,文娟. 基于全局和局部关系的类属特征多标记分类算法[J]. 《山东大学学报(理学版)》, 2024, 59(5): 23-34.
[2] 张志浩,林耀进,卢舜,吴镒潾,王晨曦. 流缺失标记环境下的多标记特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(8): 39-52.
[3] 孙林,梁娜,徐久成. 基于自适应邻域互信息与谱聚类的特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(12): 13-24.
[4] 余鹰,吴新念,王乐为,张应龙. 基于标记相关性的多标记三支分类算法[J]. 《山东大学学报(理学版)》, 2020, 55(3): 81-88.
[5] 冯新营1,2,计华1,2,张化祥1,2. 基于聚类优化的RBF神经网络多标记学习算法[J]. J4, 2012, 47(5): 63-67.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙守斌,孟广武,赵 峰 . 序同态的Dα-连续性[J]. J4, 2007, 42(7): 49 -53 .
[2] 郭亭,鲍晓明 . P137G点突变对嗜热细菌木糖异构酶酶活性及热稳定性的影响[J]. J4, 2006, 41(6): 145 -148 .
[3] 江雪莲,石洪波*. 产生式与判别式组合分类器学习算法[J]. J4, 2010, 45(7): 7 -12 .
[4] 彭艳芬,李宝宗,刘天宝 . 有机气体麻醉活性的构效关系研究[J]. J4, 2006, 41(5): 148 -150 .
[5] 于少伟. 基于云理论的新的不确定性推理模型研究[J]. J4, 2009, 44(3): 84 -87 .
[6] 郭 磊,于瑞林,田发中 . 一类常规跳变系统的最优控制[J]. J4, 2006, 41(1): 35 -40 .
[7] 杨伦,徐正刚,王慧*,陈其美,陈伟,胡艳霞,石元,祝洪磊,曾勇庆*. RNA干扰沉默PID1基因在C2C12细胞中表达的研究[J]. J4, 2013, 48(1): 36 -42 .
[8] 刘艳萍,吴群英. 优化权重下高斯序列最大值几乎处处中心极限定理[J]. 山东大学学报(理学版), 2014, 49(05): 50 -53 .
[9] 杜吉祥1,2,余庆1,翟传敏1. 基于稀疏性约束非负矩阵分解的人脸年龄估计方法[J]. J4, 2010, 45(7): 65 -69 .
[10] 周娟,郭卫华,宗美娟,韩雪梅,王仁卿 . 房干村不同植被下可培养细菌多样性研[J]. J4, 2006, 41(6): 161 -167 .