您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (5): 90-99.doi: 10.6040/j.issn.1671-9352.7.2023.148

• • 上一篇    下一篇

基于高维相关性多标签在线流特征选择

朱礼全1,2,林耀进1,2,毛煜1,2,程雨轩1,2   

  1. 1.闽南师范大学计算机学院, 福建 漳州 363000;2.数据科学与智能应用福建省高等学校重点实验室, 福建 漳州 363000
  • 发布日期:2024-05-09
  • 基金资助:
    福建省自然科学基金资助项目(2022J01914)

Multi-label online stream feature selection based on high-dimensional correlation

ZHU Liquan1,2, LIN Yaojin1,2, MAO Yu1,2, CHENG Yuxuan1,2   

  1. 1. School of Computer Science, Minnan Normal University, Zhangzhou 363000, Fujian, China;
    2. Key Laboratory of Data Science and Intelligence Application, Minnan Normal University, Zhangzhou 363000, Fujian, China
  • Published:2024-05-09

摘要: 提出了一种基于高维相关性的多标签在线流特征选择算法,该算法将标签空间进行等价映射,构建基于高维标签空间的权重无向图,利用图信息和Jaccard指数来衡量标签之间的高维权重,利用标签的高维相关性计算新到达特征的显著性。通过迭代显著性均值来判断新特征的显著水平,设计了一种基于平衡全局和局部的在线特征选择算法对已选特征子集进行动态优化,考虑已选特征与标签空间的全局相关性,过滤掉不相关的特征。分析已选特征之间的局部相关性,剔除冗余特征。与6种多标签特征选择方法进行对比实验,实验结果验证了所提算法的有效性。

关键词: 多标签特征选择, 在线流特征, 高维相关性, 标签权重

Abstract: This paper proposes a multi-label online stream feature selection algorithm based on high-dimensional correlation. The algorithm employs an equivalent mapping of the label space and constructs a weighted undirected graph based on the high-dimensional label space. It utilizes graph information and Jaccard index to measure the high-dimensional weights between labels. The significance of newly arrived features is calculated based on the high-dimensional correlation of the labels, and the significance level of new features is determined through iterative mean significance. Furthermore, a balanced global and local online feature selection algorithm is designed to dynamically optimize the selected feature subset by considering the global correlation between the selected features and the label space, thereby filtering out irrelevant features. Redundant features are eliminated by analyzing the local correlation among the selected features. The testing results validate the effectiveness of the proposed algorithm through comparative tests with six other multi-label feature selection methods.

Key words: multi-label feature selection, online streaming feature, high dimensional correlation, label weight

中图分类号: 

  • TP391
[1] 白盛兴,林耀进,王晨曦,等. 基于邻域粗糙集的大规模层次分类在线流特征选择[J]. 模式识别与人工智能, 2019, 32(9):811-820. BAI Shengxing, LIN Yaojin, WANG Chenxi, et al. Large-scale hierarchical classification online streaming feature selection based on neighborhood rough set[J]. Pattern Recognition and Artificial Intelligence, 2019, 32(9):811-820.
[2] HE Zhifen, YANG Ming, LIU Huidong, et al. Calibrated multi-label classification with label correlations[J]. Neural Processing Letters, 2019, 50:1361-1380.
[3] ASDAGHI F, SOLEIMANI A. An effective feature selection method for web spam detection[J]. Knowledge-based Systems, 2019, 166:198-206.
[4] SONG Liangchen, WU Jialian, YANG Ming, et al. Handling difficult labels for multi-label image classification via uncertainty distillation[C] // Proceedings of the 29th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2021:2410-2419.
[5] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Streaming feature selection for multlabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25:1491-1507.
[6] ZHANG Jia, WU Hanrui, JIANG Min, et al. Group-preserving label-specific feature selection for multi-label learning[J]. Expert Systems with Applications, 2023, 213:118861.
[7] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. MULFE: multi-label learning via label-specific feature space ensemble[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 16(1):1-24.
[8] WU Yilin, LIU Jinghua, YU Xiehua, et al. Neighborhood rough set based multi-label feature selection with label correlation[J]. Concurrency and Computation: Practice and Experience, 2022, 34(22):1-13.
[9] 尤殿龙,郭松,赵春慧,等. 面向分类的流特征在线特征选择算法[J].电子学报, 2020, 48(2):321-332. YOU Dianlong, GUO Song, ZHAO Chunhui, et al. Online feature selection with streaming features for classification[J]. Acta Electronica Sinica, 2020, 48(2):321-332.
[10] LIU Jinghua, LIN Yaojin, WU Shunxiang, et al. Online multi-label group feature selection[J]. Knowledge-based Systems, 2018, 143:42-57.
[11] LIN Jinghua, LIN Yaojin, LI Yuwen, et al. Online multi-label streaming feature selection based on neighborhood rough set[J]. Pattern Recognition, 2018, 84:273-287.
[12] YOU Dianlong, LI Ruiqi, LIANG Shunpan, et al. Online causal feature selection for streaming features[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 34(3):1563-1577.
[13] SHANNON C E. A mathematical theory of communication[J]. Bell Systems Technical Journal, 1948, 27(3):379-423.
[14] 滕书华,周石琳,孙即祥,等. 基于条件熵的不完备信息系统属性约简算法[J]. 国防科技大学学报, 2010, 32(1):90-94. TENG Shuhua, ZHOU Shilin, SUN Jixiang, et al. Attribute reduction algorithm based on conditional entropy under incomplete information system[J]. Journal of National University of Defense Technology, 2010, 32(1):90-94.
[15] HASHEMI A, DOWLATSHAHI B M, NEZAMABADI-POUR H. MGFS: a multi-label graph-based feature selection algorithm via PageRank centrality[J]. Expert Systems with Applications, 2020, 142:1-43.
[16] YOU Dianlong, WANG Yang, XIAO Jiawei, et al. Online multi-label streaming feature selection with label correlation[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(3):2901-2915.
[17] LIN Yaojin, HU Qinghua, LIN Jinghua, et al. Streaming feature selection for multilabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6):1491-1507.
[18] ZHANG Yin, ZHOU Zihong. Multi-label dimensionality reduction via dependence maximization[J]. ACM Transactions on Knowledge Discovery from Data, 2010, 4(3):1-21.
[19] LEE J, KIM D. SCLS: multi-label feature selection based on scalable criterion for large label set[J]. Pattern Recognition, 2017, 66:342-352.
[20] HASHEMI A, DOWLATSHAHI B M, NEZAMABADI-POUR H. MFS-MCDM: multi-label feature selection using multi-criteria decision making[J]. Knowledge-based Systems, 2020, 206:1-46.
[21] LEE J, KIM D. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognition Letters, 2013, 34(3):349-357.
[1] 陈海粟,廖佳纯,姚思诚. 政府开放数据中个人信息披露识别与统计方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 95-106.
[2] 温欣,李德玉. 基于属性加权的ML-KNN方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 107-117.
[3] 曾雪强,孙雨,刘烨,万中英,左家莉,王明文. 基于情感分布的emoji嵌入式表示[J]. 《山东大学学报(理学版)》, 2024, 59(3): 81-94.
[4] 牛泽群,李晓戈,强成宇,韩伟,姚怡,刘洋. 基于图注意力神经网络的实体消歧方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 71-80, 94.
[5] 史春雨,毛煜,刘浩阳,林耀进. 基于样本相关性的层次特征选择算法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 61-70.
[6] 卢婵,郭军军,谭凯文,相艳,余正涛. 基于文本指导的层级自适应融合的多模态情感分析[J]. 《山东大学学报(理学版)》, 2023, 58(12): 31-40, 51.
[7] 王新生,朱小飞,李程鸿. 标签指导的多尺度图神经网络蛋白质作用关系预测方法[J]. 《山东大学学报(理学版)》, 2023, 58(12): 22-30.
[8] 张乃洲,曹薇. 一种基于文本语义扩展的记忆网络查询建议模型[J]. 《山东大学学报(理学版)》, 2023, 58(12): 10-21.
[9] 陈淑珍,史开泉,李守伟. 微信息的嵌入生成及其智能隐藏-还原[J]. 《山东大学学报(理学版)》, 2023, 58(12): 1-9.
[10] 仲诚诚,周恒,张梓童,张春雷. LAC-UNet: 基于胶囊表达局部-整体特征关系的语义分割模型[J]. 《山东大学学报(理学版)》, 2023, 58(11): 116-126.
[11] 吴贤君,唐绍诗,王明秋. 融合基础属性和通信行为的移动用户个性化推荐[J]. 《山东大学学报(理学版)》, 2023, 58(9): 81-93.
[12] 那宇嘉,谢珺,杨海洋,续欣莹. 融合上下文的知识图谱补全方法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 71-80.
[13] 李程,车文刚,高盛祥. 一种用于航拍图像的目标检测算法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 59-70.
[14] 易三莉,陈建亭,贺建峰. ASR-UNet: 一种基于注意力机制改进的视网膜血管[J]. 《山东大学学报(理学版)》, 2021, 56(9): 13-20.
[15] 王静红,梁丽娜,李昊康,周易. 基于注意力网络特征的社区发现算法[J]. 《山东大学学报(理学版)》, 2021, 56(9): 1-12,20.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 田学刚, 王少英. 算子方程AXB=C的解[J]. J4, 2010, 45(6): 74 -80 .
[2] 庞观松,张黎莎,蒋盛益*,邝丽敏,吴美玲. 一种基于名词短语的检索结果多层聚类方法[J]. J4, 2010, 45(7): 39 -44 .
[3] 朱志强 许广银 许琳 连剑. 基于视频业务的病毒式移动通信交叉熵法研究[J]. J4, 2009, 44(9): 32 -34 .
[4] 邱桃荣,王璐,熊树洁,白小明. 一种基于粒计算的知识隐藏方法[J]. J4, 2010, 45(7): 60 -64 .
[5] 薛秋芳1,2,高兴宝1*,刘晓光1. H-矩阵基于外推GaussSeidel迭代法的几个等价条件[J]. J4, 2013, 48(4): 65 -71 .
[6] 刘纪芹, . 双枝模糊集并-表现定理[J]. J4, 2006, 41(2): 7 -13 .
[7] 王 琦,赵秀恒,李国君 . 超图在树环中的嵌入问题[J]. J4, 2007, 42(10): 114 -117 .
[8] 刘建亚,展 涛 . 二次Waring-Goldbach问题[J]. J4, 2007, 42(2): 1 -18 .
[9] 李敏1,2,李歧强1. 不确定奇异时滞系统的观测器型滑模控制器[J]. 山东大学学报(理学版), 2014, 49(03): 37 -42 .
[10] 马媛媛, 孟慧丽, 徐久成, 朱玛. 基于粒计算的正态粒集下的格贴近度[J]. 山东大学学报(理学版), 2014, 49(08): 107 -110 .