《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (3): 61-70.doi: 10.6040/j.issn.1671-9352.7.2023.1073
史春雨1,2(),毛煜1,2,*(),刘浩阳1,2,林耀进1,2
Chunyu SHI1,2(),Yu MAO1,2,*(),Haoyang LIU1,2,Yaojin LIN1,2
摘要:
提出了基于样本相关性的层次特征选择算法(hierarchical feature selection algorithm based on instance correlations, HFSIC)以进一步提高分层分类特征选择算法的性能。在使用稀疏正则项去除不相关特征之后, 将层次结构中的父子关系与特征空间中样本之间的重构关系相结合, 学习同一子树下各类别的样本相关性, 利用递归正则优化输出特征权重矩阵。在衡量样本相关性时, 将重构系数矩阵整合到训练模型中, 同时利用l2, 1范数去除不相关的和冗余的特征。使用加速近端梯度法解决所提模型的优化问题, 并在多个评价指标下评估所提算法的优越性。实验结果表明, 所提方法在5个数据集上的表现优于其他算法, 验证了该算法的有效性。
中图分类号:
1 |
王忠伟, 陈叶芳, 钱江波, 等. 基于LSH的高维大数据k近邻搜索算法[J]. 电子学报, 2016, 44 (4): 906- 912.
doi: 10.3969/j.issn.0372-2112.2016.04.022 |
WANG Zhongwei , CHEN Yefang , QIAN Jiangbo , et al. LSH-based algorithm for k nearest neighbor search on bigdata[J]. Acta Electronica Sinica, 2016, 44 (4): 906- 912.
doi: 10.3969/j.issn.0372-2112.2016.04.022 |
|
2 | 胡清华, 王煜, 周玉灿, 等. 大规模分类任务的分层学习方法综述[J]. 中国科学(信息科学), 2018, 48 (5): 487- 500. |
HU Qinghua , WANG Yu , ZHOU Yucan , et al. A review on hierarchical learning methods for large scale classification task[J]. Sci Sin Inform, 2018, 48 (5): 487- 500. | |
3 | DUDA R O , HART P E , STORK D G . Pattern classification[M]. Hoboken: Wiley, 2000. |
4 |
LIU Xinxin , ZHOU Yucan , ZHAO Hong . Robust hierarchical feature selection driven by data and knowledge[J]. Information Sciences, 2021, 551, 341- 357.
doi: 10.1016/j.ins.2020.11.003 |
5 | WANG Jian , ZHANG Huaqing , WANG Junze , et al. Feature selection using a neural network with group lasso regularization and controlled redundancy[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32 (3): 1110- 1123. |
6 | 林耀进, 白盛兴, 赵红, 等. 基于标签关联性的分层分类共有与固有特征选择[J]. 软件学报, 2022, 33 (7): 2667- 2682. |
LIN Yaojin , BAI Shengxing , ZHAO Hong , et al. A label correlation based common and specific feature selection for large-scale hierarchical classification[J]. Journal of Software, 2022, 33 (7): 2667- 2682. | |
7 | FREEMAN C, KULIC D, BASIR O. Joint feature selection and hierarchical classifier design[C]//2011 IEEE International Conference on Systems, Man and Cybernetics. Waterloo: IEEE, 2011: 1728-1734. |
8 |
FREEMAN C , KULIC D , BASIR O , et al. Feature-selected tree-based classification[J]. IEEE Transactions on Cybernetics, 2013, 43 (6): 1990- 2004.
doi: 10.1109/TSMCB.2012.2237394 |
9 | GRIMAUDO L, MELLIA M, BARALIS E. Hierarchical learning for fine grained internet traffic classification[C]//2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC). Copenhagen: IEEE, 2012: 463-468. |
10 |
ZHAO Hong , HU Qinghua , ZHU Pengfei , et al. A recursive regularization based feature selection framework for hierarchical classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33 (7): 2833- 2846.
doi: 10.1109/TKDE.2019.2960251 |
11 | TUO Qianjuan , ZHAO Hong , HU Qinghua . Hierarchical feature selection with subtree based graph regularization[J]. Knowledge-Based Systems, 2018, 163 (1): 996- 1008. |
12 | DE ABREU I B M, MANTOVANI R G, CERRI R. Incorporating instance correlations in multi-label classification via label-space[C]//2017 International Joint Conference on Neural Networks (IJCNN). Anchorage: IEEE, 2017: 581-588. |
13 | HUANG Shengjun, ZHOU Zhihua. Multi-label learning by exploiting label correlations locally[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Toronto, Ontario: AAAI, 2012, 26(1): 949-955. |
14 |
HUANG Jun , LI Guorong , HUANG Qingming , et al. Joint feature selection and classification for multilabel learning[J]. IEEE Transactions on Cybernetics, 2018, 48 (3): 876- 889.
doi: 10.1109/TCYB.2017.2663838 |
15 | LI Junlong , LI Peipei , HU Xuegang , et al. Learning common and label-specific features for multi-label classification with correlation information[J]. Pattern Recognition, 2022, 121, 108- 259. |
16 | LI Jundong , CHENG Kewei , WANG Suhang , et al. Feature selection: a data perspective[J]. ACM Computing Surveys (CSUR), 2017, 50 (6): 1- 45. |
17 | 刘浩阳, 林耀进, 刘景华, 等. 由粗到细的分层特征选择[J]. 电子学报, 2022, 50 (11): 2778- 2789. |
LIU Haoyang , LIN Yaojin , LIU Jinghua , et al. Hierarchical feature selection from coarse to fine[J]. Acta Electronica Sinica, 2022, 50 (11): 2778- 2789. | |
18 | LIN Zhouchen , GANESH A , WRIGHT J , et al. Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix[J]. Computational Advances, 2009, 10, 1- 18. |
19 | DEKEL O, KESHET J, SINGER Y. Large margin hierarchical classification[C]//Proceedings of the Twenty-first International Conference on Machine Learning. New York: ACM, 2004: 1-8. |
20 | SILLA C N , FREITAS A A . A survey of hierarchical classification across different application domains[J]. Data Mining & Knowledge Discovery, 2011, 22 (1/2): 31- 72. |
21 | NIE Feiping, HUANG Heng, CAI Xiao, et al. Efficient and robust feature selection via joint ℓ2, 1-norms minimization[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Kyoto: IEEE, 2010: 1813-1821. |
22 |
PENG Hanchuan , LONG Fuhui , DING C . Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27 (8): 1226- 1238.
doi: 10.1109/TPAMI.2005.159 |
23 | DEMIAR J , SCHUURMAMS D . Statistical comparisons of classifiers over multiple data sets[J]. Journal of Machine Learning Research, 2006, 7 (1): 1- 30. |
[1] | 张志浩,林耀进,卢舜,吴镒潾,王晨曦. 流缺失标记环境下的多标记特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(8): 39-52. |
[2] | 孙林,陈雨生,徐久成. 基于改进ReliefF的多标记特征选择算法[J]. 《山东大学学报(理学版)》, 2022, 57(4): 1-11. |
[3] | 孙林,梁娜,徐久成. 基于自适应邻域互信息与谱聚类的特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(12): 13-24. |
[4] | 张要,马盈仓,杨小飞,朱恒东,杨婷. 结合流形结构与柔性嵌入的多标签特征选择[J]. 《山东大学学报(理学版)》, 2021, 56(7): 91-102. |
[5] | 黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报(理学版), 2017, 52(3): 91-96. |
[6] | 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93. |
[7] | 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12. |
[8] | 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81. |
[9] | 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30. |
[10] | 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58. |
[11] | 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121. |
[12] | 杨玉珍 刘培玉 朱振方 邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51. |
[13] | 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 . |
[14] | 李 森,马 军,赵 嫣,雷景生, . 对数字化科技论文的自动分类研究[J]. J4, 2006, 41(3): 81-84 . |
[15] | 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 . |
|