您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2022, Vol. 57 ›› Issue (4): 1-11.doi: 10.6040/j.issn.1671-9352.7.2021.167

• •    

基于改进ReliefF的多标记特征选择算法

孙林1,2,陈雨生1,徐久成1,2   

  1. 1.河南师范大学计算机与信息工程学院, 河南 新乡 453007;2.智慧商务与物联网技术河南省工程实验室, 河南 新乡 453007
  • 发布日期:2022-03-29
  • 作者简介:孙林(1979— ),男,博士,副教授,硕士生导师,研究方向为粒计算、数据挖掘、生物信息学等. E-mail:sunlin@htu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62076089,61976082);河南省科技攻关资助项目(212102210136);河南师范大学研究生科研创新资助项目(YL202131)

Multilabel feature selection algorithm based on improved ReliefF

SUN Lin1,2, CHEN Yu-sheng1, XU Jiu-cheng1,2   

  1. 1. College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, Henan, China;
    2. Henan Engineering Laboratory of Smart Business and Internet of Things Technology, Xinxiang 453007, Henan, China
  • Published:2022-03-29

摘要: 针对传统的ReliefF算法仅能处理单标记数据,以及其改进算法没有充分利用样本间相关性等问题,提出一种基于改进ReliefF的多标记特征选择算法。首先使用余弦相似度函数衡量样本特征间的相似程度,利用杰卡德距离度量样本的标记之间的标记相关性,定义样本间相似度函数度量样本在整个样本空间的相似关系。然后,定义样本的同类或异类判别公式,判断随机样本的最近邻同类和异类样本。最后,提出新的特征权值迭代公式改进ReliefF算法,设计多标记特征选择算法。通过平均分类精度、覆盖率、1错误率、排序损失、汉明损失这5种评价指标,在7个公开多标记数据集上分析和测试所提算法的分类性能。实验结果表明所提算法是有效的。

关键词: 多标记, 特征选择, 标记相关性, ReliefF

Abstract: Aiming at the problems that the traditional ReliefF algorithm can only process single-label data, and its improved algorithms do not make full use of the correlation between samples, a multilabel feature selection algorithm based on improved ReliefF is proposed. First, the cosine similarity function is used to measure the similarity between features of samples, the Jaccard distance is employed to measure the correlation of labels among labels of samples, and then the similarity function among samples is defined to measure the similarity relationship between samples in the entire sample space. Second, the discrimination formula of the homogeneous or heterogeneous samples is defined to judge the nearest homogeneous or heterogeneous samples for the random samples. Finally, a new iterative formula of feature weights is proposed to improve the ReliefF algorithm, and then a multi-label feature selection algorithm is designed. The five different evaluation metrics including Average Precision, Coverage, One-error,Ranking Loss and Hamming Loss are employed to analyze and test the classification performance of the proposed algorithm on seven public multilabel datasets. The experimental results show that the proposed algorithm is effective.

Key words: multilabel, feature selection, correlation of labels, ReliefF

中图分类号: 

  • TP181
[1] 余鹰,吴新念,王乐为,等. 基于标记相关性的多标记三支分类算法[J]. 山东大学学报(理学版),2020,55(3):81-88. YU Ying, WU Xinnian, WANG Lewei, et al. A multi-label three-way classification algorithm based on label correlation[J]. Journal of Shandong University(Natural Science), 2020, 55(3):81-88.
[2] 王维博,张斌,曾文入,等. 基于特征融合一维卷积神经网络的电能质量扰动分类[J]. 电力系统保护与控制,2020,48(6):53-60. WANG Weibo, ZHANG Bin, ZENG Wenru, et al. Power quality disturbance classification of one-dimensional convolutional neural networks based on feature fusion[J]. Power Syetem Protection and Control, 2020, 48(6):53-60.
[3] 邓威,郭钇秀,李勇,等. 基于特征选择和Stacking集成学习的配电网网损预测[J]. 电力系统保护与控制,2020,48(15):108-115. DENG Wei, GUO Yixiu, LI Yong, et al. Power losses prediction based on feature selection and Stacking integrated learning[J]. Power System Protection and Control, 2020, 28(15):108-115.
[4] 薛占鳌,庞文莉,姚守倩,等. 基于前景理论的直觉模糊三支决策模型[J]. 河南师范大学学报(自然科学版),2020,48(5):31-36. XUE Zhanao, PANG Wenli, YAO Shouqian, et al. The prospect theory based intuitionistic fuzzy three-way decisions model[J]. Journal of Henan Normal University(Natural Science Edition), 2020, 48(5):31-36.
[5] 韩素敏,郑书晴,何永盛. 基于粗糙集贪心算法的逆变器开路故障诊断[J]. 电力系统保护与控制,2020,48(17):122-130. HAN Sumin, ZHENG Shuqing, HE Yongsheng. Open circuit fault diagnosis for inverters based on a greedy algorithm of a rough set[J]. Power System Protection and Control, 2020, 48(17):122-130.
[6] 刘琨,封硕. 加强局部搜索能力的人工蜂群算法[J]. 河南师范大学学报(自然科学版),2021,49(2):15-24. LIU Kun, FENG Shuo. An improved artificial bee colony algorithm for enhancing local search ability[J]. Journal of Henan Normal University(Natural Science Edition), 2021, 49(2):15-24.
[7] 刘艳,程璐,孙林. 基于K-S检验和邻域粗糙集的特征选择方法[J]. 河南师范大学学报(自然科学版),2019,47(2):21- 28. LIU Yan, CHENG Lu, SUN Lin. Feature selection method based on K-S test and neighborhood rough sets[J]. Journal of Henan Normal University(Natural Science Edition), 2019, 47(2):21- 28.
[8] SHA Zhichao, LIU Zhangmeng, MA Chen, et al. Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information[J]. Applied Intelligence, 2021, 51:326-340.
[9] SHU Wenhao, QIAN Wenbin, XIE Yonghong. Incremental feature selection for dynamic hybrid data using neighborhood rough set[J]. Knowledge-Based Systems, 2020, 194:105516.
[10] LIM H, KIM D W. MFC: initialization method for multi-label feature selection based on conditional mutual information[J]. Neurocomputing, 2020, 382:40-51.
[11] FAN Yuling, LIU Jinghua, WENG Wei, et al. Multi-label feature selection with local model discriminant and label correlations[J]. Neurocomputing, 2021, 442:98-115.
[12] SUN Lin, YIN Tengyu, QIAN Yuhua, et al. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems[J]. Information Sciences, 2020, 537:401-424.
[13] KONONENKO I. Estimating attributes: analysis and extensions of RELIEF[C] //European Conference on Machine Learning on Machine Learning. Berlin: Springer, 1994: 171-182.
[14] 蔡亚萍,杨明. 一种利用局部标记相关性的多标记特征选择算法[J]. 南京大学学报(自然科学版),2016,52(4):693-704. CAI Yaping, YANG Ming. A multi-label feature selection algorithm by exploiting label correlations locally[J]. Journal of Nanjing University(Nature Science), 2016, 52(4):693-704.
[15] 刘海洋,王志海,张志东. 基于ReliefF剪枝的多标记分类算法[J]. 计算机学报,2019,42(3):483-496. LIU Haiyang, WANG Zhihai, ZHANG Zhidong. ReliefF based pruning for multi-label classification[J]. Journal of Computer, 2019, 42(3):483-496.
[16] 马晶莹,宣恒农. 扩展ReliefF的两种多标签特征选择算法[J]. 计算机应用与软件,2017,34(7):298-302,324. MA Jingying, XUAN Hengnong. Two feature selection algorithms for multi-label classification by extending ReliefF[J]. Computer Applications and Software, 2017, 34(7):298-302,324.
[17] KIRA K, RENDELL L. The feature selection problem: traditional methods and a new algorithm[C] //Proceedings of the 10th National Conference on Artificial Intelligence. Menlo Park: USA, AAAI, 1992: 129-134.
[18] ZHANG Minling, ZHOU Zhihua. ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7):2038-2048.
[19] 王一宾,吴陈,程玉胜,等. 不平衡标记差异性多标记特征选择算法[J]. 深圳大学学报(理工版),2020,37(3):234-242. WANG Yibin, WU Chen, CHENG Yusheng, et al. Multi-label feature selection algorithm with imbalance label otherness[J]. Journal of Shenzhen University(Science and Engineering), 2020, 37(3):234-242.
[20] ZHANG Yin, ZHOU Zhihua. Multilabel dimensionality reduction via dependence maximization[J]. ACM Transactions on Knowledge Discovery from Data, 2010, 4(3):14.
[21] LEE J, KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognition Letters, 2013, 34(3):349-357.
[22] ZHANG M L, PENA J M, ROBLES V. Feature selection for multi-label naive Bayes classification[J]. Information Sciences, 2009, 179(19):3218-3229.
[23] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Streaming feature selection for multilabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6):1491-1507.
[24] CHEN Linlin, CHEN Degang. Alignment based feature selection for multi-label learning[J]. Neural Processing Letters, 2019, 50(3):2323-2344.
[1] 张要,马盈仓,杨小飞,朱恒东,杨婷. 结合流形结构与柔性嵌入的多标签特征选择[J]. 《山东大学学报(理学版)》, 2021, 56(7): 91-102.
[2] 余鹰,吴新念,王乐为,张应龙. 基于标记相关性的多标记三支分类算法[J]. 《山东大学学报(理学版)》, 2020, 55(3): 81-88.
[3] 黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报(理学版), 2017, 52(3): 91-96.
[4] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[5] 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12.
[6] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[7] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[8] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[9] 冯新营1,2,计华1,2,张化祥1,2. 基于聚类优化的RBF神经网络多标记学习算法[J]. J4, 2012, 47(5): 63-67.
[10] 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
[11] 杨玉珍 刘培玉 朱振方 邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51.
[12] 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 .
[13] 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!