《山东大学学报(理学版)》 ›› 2022, Vol. 57 ›› Issue (4): 1-11.doi: 10.6040/j.issn.1671-9352.7.2021.167
• •
孙林1,2,陈雨生1,徐久成1,2
SUN Lin1,2, CHEN Yu-sheng1, XU Jiu-cheng1,2
摘要: 针对传统的ReliefF算法仅能处理单标记数据,以及其改进算法没有充分利用样本间相关性等问题,提出一种基于改进ReliefF的多标记特征选择算法。首先使用余弦相似度函数衡量样本特征间的相似程度,利用杰卡德距离度量样本的标记之间的标记相关性,定义样本间相似度函数度量样本在整个样本空间的相似关系。然后,定义样本的同类或异类判别公式,判断随机样本的最近邻同类和异类样本。最后,提出新的特征权值迭代公式改进ReliefF算法,设计多标记特征选择算法。通过平均分类精度、覆盖率、1错误率、排序损失、汉明损失这5种评价指标,在7个公开多标记数据集上分析和测试所提算法的分类性能。实验结果表明所提算法是有效的。
中图分类号:
[1] 余鹰,吴新念,王乐为,等. 基于标记相关性的多标记三支分类算法[J]. 山东大学学报(理学版),2020,55(3):81-88. YU Ying, WU Xinnian, WANG Lewei, et al. A multi-label three-way classification algorithm based on label correlation[J]. Journal of Shandong University(Natural Science), 2020, 55(3):81-88. [2] 王维博,张斌,曾文入,等. 基于特征融合一维卷积神经网络的电能质量扰动分类[J]. 电力系统保护与控制,2020,48(6):53-60. WANG Weibo, ZHANG Bin, ZENG Wenru, et al. Power quality disturbance classification of one-dimensional convolutional neural networks based on feature fusion[J]. Power Syetem Protection and Control, 2020, 48(6):53-60. [3] 邓威,郭钇秀,李勇,等. 基于特征选择和Stacking集成学习的配电网网损预测[J]. 电力系统保护与控制,2020,48(15):108-115. DENG Wei, GUO Yixiu, LI Yong, et al. Power losses prediction based on feature selection and Stacking integrated learning[J]. Power System Protection and Control, 2020, 28(15):108-115. [4] 薛占鳌,庞文莉,姚守倩,等. 基于前景理论的直觉模糊三支决策模型[J]. 河南师范大学学报(自然科学版),2020,48(5):31-36. XUE Zhanao, PANG Wenli, YAO Shouqian, et al. The prospect theory based intuitionistic fuzzy three-way decisions model[J]. Journal of Henan Normal University(Natural Science Edition), 2020, 48(5):31-36. [5] 韩素敏,郑书晴,何永盛. 基于粗糙集贪心算法的逆变器开路故障诊断[J]. 电力系统保护与控制,2020,48(17):122-130. HAN Sumin, ZHENG Shuqing, HE Yongsheng. Open circuit fault diagnosis for inverters based on a greedy algorithm of a rough set[J]. Power System Protection and Control, 2020, 48(17):122-130. [6] 刘琨,封硕. 加强局部搜索能力的人工蜂群算法[J]. 河南师范大学学报(自然科学版),2021,49(2):15-24. LIU Kun, FENG Shuo. An improved artificial bee colony algorithm for enhancing local search ability[J]. Journal of Henan Normal University(Natural Science Edition), 2021, 49(2):15-24. [7] 刘艳,程璐,孙林. 基于K-S检验和邻域粗糙集的特征选择方法[J]. 河南师范大学学报(自然科学版),2019,47(2):21- 28. LIU Yan, CHENG Lu, SUN Lin. Feature selection method based on K-S test and neighborhood rough sets[J]. Journal of Henan Normal University(Natural Science Edition), 2019, 47(2):21- 28. [8] SHA Zhichao, LIU Zhangmeng, MA Chen, et al. Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information[J]. Applied Intelligence, 2021, 51:326-340. [9] SHU Wenhao, QIAN Wenbin, XIE Yonghong. Incremental feature selection for dynamic hybrid data using neighborhood rough set[J]. Knowledge-Based Systems, 2020, 194:105516. [10] LIM H, KIM D W. MFC: initialization method for multi-label feature selection based on conditional mutual information[J]. Neurocomputing, 2020, 382:40-51. [11] FAN Yuling, LIU Jinghua, WENG Wei, et al. Multi-label feature selection with local model discriminant and label correlations[J]. Neurocomputing, 2021, 442:98-115. [12] SUN Lin, YIN Tengyu, QIAN Yuhua, et al. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems[J]. Information Sciences, 2020, 537:401-424. [13] KONONENKO I. Estimating attributes: analysis and extensions of RELIEF[C] //European Conference on Machine Learning on Machine Learning. Berlin: Springer, 1994: 171-182. [14] 蔡亚萍,杨明. 一种利用局部标记相关性的多标记特征选择算法[J]. 南京大学学报(自然科学版),2016,52(4):693-704. CAI Yaping, YANG Ming. A multi-label feature selection algorithm by exploiting label correlations locally[J]. Journal of Nanjing University(Nature Science), 2016, 52(4):693-704. [15] 刘海洋,王志海,张志东. 基于ReliefF剪枝的多标记分类算法[J]. 计算机学报,2019,42(3):483-496. LIU Haiyang, WANG Zhihai, ZHANG Zhidong. ReliefF based pruning for multi-label classification[J]. Journal of Computer, 2019, 42(3):483-496. [16] 马晶莹,宣恒农. 扩展ReliefF的两种多标签特征选择算法[J]. 计算机应用与软件,2017,34(7):298-302,324. MA Jingying, XUAN Hengnong. Two feature selection algorithms for multi-label classification by extending ReliefF[J]. Computer Applications and Software, 2017, 34(7):298-302,324. [17] KIRA K, RENDELL L. The feature selection problem: traditional methods and a new algorithm[C] //Proceedings of the 10th National Conference on Artificial Intelligence. Menlo Park: USA, AAAI, 1992: 129-134. [18] ZHANG Minling, ZHOU Zhihua. ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7):2038-2048. [19] 王一宾,吴陈,程玉胜,等. 不平衡标记差异性多标记特征选择算法[J]. 深圳大学学报(理工版),2020,37(3):234-242. WANG Yibin, WU Chen, CHENG Yusheng, et al. Multi-label feature selection algorithm with imbalance label otherness[J]. Journal of Shenzhen University(Science and Engineering), 2020, 37(3):234-242. [20] ZHANG Yin, ZHOU Zhihua. Multilabel dimensionality reduction via dependence maximization[J]. ACM Transactions on Knowledge Discovery from Data, 2010, 4(3):14. [21] LEE J, KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognition Letters, 2013, 34(3):349-357. [22] ZHANG M L, PENA J M, ROBLES V. Feature selection for multi-label naive Bayes classification[J]. Information Sciences, 2009, 179(19):3218-3229. [23] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Streaming feature selection for multilabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6):1491-1507. [24] CHEN Linlin, CHEN Degang. Alignment based feature selection for multi-label learning[J]. Neural Processing Letters, 2019, 50(3):2323-2344. |
[1] | 张要,马盈仓,杨小飞,朱恒东,杨婷. 结合流形结构与柔性嵌入的多标签特征选择[J]. 《山东大学学报(理学版)》, 2021, 56(7): 91-102. |
[2] | 余鹰,吴新念,王乐为,张应龙. 基于标记相关性的多标记三支分类算法[J]. 《山东大学学报(理学版)》, 2020, 55(3): 81-88. |
[3] | 黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报(理学版), 2017, 52(3): 91-96. |
[4] | 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93. |
[5] | 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12. |
[6] | 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81. |
[7] | 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30. |
[8] | 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58. |
[9] | 冯新营1,2,计华1,2,张化祥1,2. 基于聚类优化的RBF神经网络多标记学习算法[J]. J4, 2012, 47(5): 63-67. |
[10] | 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121. |
[11] | 杨玉珍 刘培玉 朱振方 邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51. |
[12] | 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 . |
[13] | 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 . |
|