基于次相关特征和邻域互信息的在线多标记特征选择算法

doi:10.6040/j.issn.1671-9352.7.2023.4523

摘要/Abstract

摘要： 为了充分地挖掘被单一度量指标算法忽略但对分类结果有利的特征,提出了基于次相关特征和邻域互信息的在线多标记特征选择算法,通过计算得到的新到达特征的重要性以及相关度,分析其显著性的区别,将特征区分为显著特征以及次相关特征。利用邻域交互信息对新到达的特征与已选特征集合进行冗余性分析,剔除依赖度较低的特征,以此逐步提升特征子集的质量。构建了基于全局的线性和非线性关系的度量指标,并以此来计算特征的局部相关度,有效地挖掘次相关特征。充分考虑特征空间中次相关特征存在的问题,将次相关特征从特征集合中剥离并单独保存,使之在冗余分析阶段不会因显著特征对度量指标敏感度高所产生的影响而被剔除出特征集合。建立了特征选择指标,利用迭代策略根据指标进行特征选择。实验结果表明,该算法具有很好的有效性和稳定性。

关键词: 在线特征选择, 多标记学习, 邻域熵, 邻域互信息, 次相关特征

Abstract: To fully mine the features neglected by the single metric algorithm but beneficial to the classifier, this paper proposes an online multi-label feature selection algorithm based on sub-correlation features and neighborhood mutual information. By calculating the importance and correlation of newly arrived features, the difference between the significance of new features is analyzed, and the features are divided into salient features and sub-correlation features. Redundancy analysis is performed on newly arrived features and selected feature sets using neighborhood interaction information, and features with low dependencies are eliminated, to gradually improve the quality of feature subsets. This paper also constructs a measurement index based on the global linear and nonlinear relationship and uses it to calculate the local correlation of features, effectively mining the sub-correlation features. Strip the sub-correlation features from the feature set and save them separately, so that they will not be eliminated from the feature set during the redundancy analysis stage due to the high sensitivity of the salient features to the measurement index. Using established feature selection indicators and iterative strategies to select features according to the indicators. Experimental results show that the proposed algorithm has good effectiveness and stability.

Key words: online feature selection, multi-label learning, neighborhood entropy, neighborhood mutual information, sub-correlation feature

中图分类号:

TP391

程雨轩,毛煜,张小清,曾艺祥,林耀进. 基于次相关特征和邻域互信息的在线多标记特征选择算法[J]. 《山东大学学报(理学版)》, 2024, 59(5): 70-81.

CHENG Yuxuan, MAO Yu, ZHANG Xiaoqing, ZENG Yixiang, LIN Yaojin. Online multi-label feature selection based on sub-correlation features and neighborhood mutual information[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 70-81.

参考文献

[1] ZHANG Minling, ZHANG Qianwen, FANG Junpeng, et al. Leveraging implicit relative labeling-importance information for effective multi-label learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(5):2057-2070.
[2] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. MULFE: multi-label learning via label-specific feature space ensemble[J]. Transactions on Knowledge Discovery from Data, 2022, 16(1):1-24.
[3] ZHANG Minling, ZHOU Zhihua. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):1819-1837.
[4] LIU Jinghua, LIN Yaojin, DING Weiping, et al. Fuzzy mutual information-based multi-label feature selection with label dependency and streaming labels[J]. IEEE Transactions on Fuzzy Systems, 2022, 31(1):77-91.
[5] WOLD S, TRYGG J, BERGLUND A, et al. Some recent developments in PLS modeling[J]. Chemometrics and Intelligent Laboratory Systems, 2001, 58(2):131-150.
[6] HOTELLING H. Relations between two sets of variates[M] //New York, USA: Springer, 1992:162-190.
[7] ZHANG Yin, ZHOU Zhihua. Multi-label dimensionality reduction via dependence maximization[J]. Transactions on Knowledge Discovery from Data, 2010, 4(3):1-41.
[8] 许行,张凯,王问剑. 一种小样本数据的特征选择方法[J]. 计算机研究与发展, 2018, 55(10):2321-2330. XU Xing, ZHANG Kai, WANG Wenjian. A feature selection method for small samples[J]. Journal of Computer Research and Development, 2018, 55(10):2321-2330.
[9] ZHANG Lingjun, HU Qinghua, DUAN Jie, et al. Multi-label feature selection with fuzzy rough sets[C] //Proceeding of International Conference on Rough Sets and Knowledge Technology. New York: Springer, Cham, 2014:121-128.
[10] WU Yilin, LIU Jinghua, LIN Yaojin, et al. Neighborhood rough set based multi-label feature selection with label correlation[J]. Concurrency and Computation Practice and Experience, 2022, 34(22):1-13.
[11] 曾艺祥, 林耀进, 李育林, 等. 基于抗噪声邻域粗糙集的在线流特征选择算法[J]. 小型微型计算机系统, 2023, 44(7):1494-1499. ZENG Yixiang, LIN Yaojin, LI Yulin, et al. Online streaming feature selection based on anti-noise neighborhood rough set[J]. Journal of Chinese Computer Systems, 2023, 44(7):1494-1499.
[12] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Streaming feature selection for multilabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6):1491-1507.
[13] LIU Jinghua, LIN Yaojin, LI Yuwen, et al. Online multi-label streaming feature selection based on neighborhood rough set[J]. Pattern Recognition, 2018, 84:273-287.
[14] YOU Dianlong, WANG Yang, XIAO Jiawei, et al. Online multi-label streaming feature selection with label correlation[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(3):2901-2915.
[15] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on neighborhood mutual information[J]. Applied Soft Computing, 2016, 38:244-256.
[16] KWAK N, CHOI C H. Input feature selection for classification problems[J]. IEEE Transactions on Neural Networks, 2002, 13(1):143-159.
[17] HASHEMI A, DOWLATSHAHI M B, NEZAMABADI-POUR H. MFS-MCDM: multi-label feature selection using multi-criteria decision making[J]. Knowledge-based Systems, 2020, 206:106365.
[18] PANIRI M, DOWLATSHAHI M B, NEZAMABADI-POUR H. MLACO: a multi-label feature selection algorithm based on ant colony optimization[J]. Knowledge-based Systems, 2020, 192:105285.
[19] HUANG Rui, WU Zhejun. Multi-label feature selection via manifold regularization and dependence maximization[J]. Pattern Recognition, 2021, 120:108149.
[20] FRIEDMAN M. A comparison of alternative tests of significance for the problem of m rankings[J]. The Annals of Mathematical Statistics, 1940, 11(1):86-92.
[21] DUNN O J. Multiple comparisons among means[J]. Journal of the American Statistical Association, 1961, 56(293):52-64.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed