您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (5): 90-99.doi: 10.6040/j.issn.1671-9352.7.2023.148

• • 上一篇    下一篇

基于高维相关性多标签在线流特征选择

朱礼全1,2,林耀进1,2,毛煜1,2,程雨轩1,2   

  1. 1.闽南师范大学计算机学院, 福建 漳州 363000;2.数据科学与智能应用福建省高等学校重点实验室, 福建 漳州 363000
  • 发布日期:2024-05-09
  • 基金资助:
    福建省自然科学基金资助项目(2022J01914)

Multi-label online stream feature selection based on high-dimensional correlation

ZHU Liquan1,2, LIN Yaojin1,2, MAO Yu1,2, CHENG Yuxuan1,2   

  1. 1. School of Computer Science, Minnan Normal University, Zhangzhou 363000, Fujian, China;
    2. Key Laboratory of Data Science and Intelligence Application, Minnan Normal University, Zhangzhou 363000, Fujian, China
  • Published:2024-05-09

摘要: 提出了一种基于高维相关性的多标签在线流特征选择算法,该算法将标签空间进行等价映射,构建基于高维标签空间的权重无向图,利用图信息和Jaccard指数来衡量标签之间的高维权重,利用标签的高维相关性计算新到达特征的显著性。通过迭代显著性均值来判断新特征的显著水平,设计了一种基于平衡全局和局部的在线特征选择算法对已选特征子集进行动态优化,考虑已选特征与标签空间的全局相关性,过滤掉不相关的特征。分析已选特征之间的局部相关性,剔除冗余特征。与6种多标签特征选择方法进行对比实验,实验结果验证了所提算法的有效性。

关键词: 多标签特征选择, 在线流特征, 高维相关性, 标签权重

Abstract: This paper proposes a multi-label online stream feature selection algorithm based on high-dimensional correlation. The algorithm employs an equivalent mapping of the label space and constructs a weighted undirected graph based on the high-dimensional label space. It utilizes graph information and Jaccard index to measure the high-dimensional weights between labels. The significance of newly arrived features is calculated based on the high-dimensional correlation of the labels, and the significance level of new features is determined through iterative mean significance. Furthermore, a balanced global and local online feature selection algorithm is designed to dynamically optimize the selected feature subset by considering the global correlation between the selected features and the label space, thereby filtering out irrelevant features. Redundant features are eliminated by analyzing the local correlation among the selected features. The testing results validate the effectiveness of the proposed algorithm through comparative tests with six other multi-label feature selection methods.

Key words: multi-label feature selection, online streaming feature, high dimensional correlation, label weight

中图分类号: 

  • TP391
[1] 白盛兴,林耀进,王晨曦,等. 基于邻域粗糙集的大规模层次分类在线流特征选择[J]. 模式识别与人工智能, 2019, 32(9):811-820. BAI Shengxing, LIN Yaojin, WANG Chenxi, et al. Large-scale hierarchical classification online streaming feature selection based on neighborhood rough set[J]. Pattern Recognition and Artificial Intelligence, 2019, 32(9):811-820.
[2] HE Zhifen, YANG Ming, LIU Huidong, et al. Calibrated multi-label classification with label correlations[J]. Neural Processing Letters, 2019, 50:1361-1380.
[3] ASDAGHI F, SOLEIMANI A. An effective feature selection method for web spam detection[J]. Knowledge-based Systems, 2019, 166:198-206.
[4] SONG Liangchen, WU Jialian, YANG Ming, et al. Handling difficult labels for multi-label image classification via uncertainty distillation[C] // Proceedings of the 29th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2021:2410-2419.
[5] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Streaming feature selection for multlabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25:1491-1507.
[6] ZHANG Jia, WU Hanrui, JIANG Min, et al. Group-preserving label-specific feature selection for multi-label learning[J]. Expert Systems with Applications, 2023, 213:118861.
[7] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. MULFE: multi-label learning via label-specific feature space ensemble[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 16(1):1-24.
[8] WU Yilin, LIU Jinghua, YU Xiehua, et al. Neighborhood rough set based multi-label feature selection with label correlation[J]. Concurrency and Computation: Practice and Experience, 2022, 34(22):1-13.
[9] 尤殿龙,郭松,赵春慧,等. 面向分类的流特征在线特征选择算法[J].电子学报, 2020, 48(2):321-332. YOU Dianlong, GUO Song, ZHAO Chunhui, et al. Online feature selection with streaming features for classification[J]. Acta Electronica Sinica, 2020, 48(2):321-332.
[10] LIU Jinghua, LIN Yaojin, WU Shunxiang, et al. Online multi-label group feature selection[J]. Knowledge-based Systems, 2018, 143:42-57.
[11] LIN Jinghua, LIN Yaojin, LI Yuwen, et al. Online multi-label streaming feature selection based on neighborhood rough set[J]. Pattern Recognition, 2018, 84:273-287.
[12] YOU Dianlong, LI Ruiqi, LIANG Shunpan, et al. Online causal feature selection for streaming features[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 34(3):1563-1577.
[13] SHANNON C E. A mathematical theory of communication[J]. Bell Systems Technical Journal, 1948, 27(3):379-423.
[14] 滕书华,周石琳,孙即祥,等. 基于条件熵的不完备信息系统属性约简算法[J]. 国防科技大学学报, 2010, 32(1):90-94. TENG Shuhua, ZHOU Shilin, SUN Jixiang, et al. Attribute reduction algorithm based on conditional entropy under incomplete information system[J]. Journal of National University of Defense Technology, 2010, 32(1):90-94.
[15] HASHEMI A, DOWLATSHAHI B M, NEZAMABADI-POUR H. MGFS: a multi-label graph-based feature selection algorithm via PageRank centrality[J]. Expert Systems with Applications, 2020, 142:1-43.
[16] YOU Dianlong, WANG Yang, XIAO Jiawei, et al. Online multi-label streaming feature selection with label correlation[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(3):2901-2915.
[17] LIN Yaojin, HU Qinghua, LIN Jinghua, et al. Streaming feature selection for multilabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6):1491-1507.
[18] ZHANG Yin, ZHOU Zihong. Multi-label dimensionality reduction via dependence maximization[J]. ACM Transactions on Knowledge Discovery from Data, 2010, 4(3):1-21.
[19] LEE J, KIM D. SCLS: multi-label feature selection based on scalable criterion for large label set[J]. Pattern Recognition, 2017, 66:342-352.
[20] HASHEMI A, DOWLATSHAHI B M, NEZAMABADI-POUR H. MFS-MCDM: multi-label feature selection using multi-criteria decision making[J]. Knowledge-based Systems, 2020, 206:1-46.
[21] LEE J, KIM D. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognition Letters, 2013, 34(3):349-357.
[1] 张晓媛, 田毅, 任子涵, 段天宇, 杨斯媛, 张月轩. 拓扑邻域基在密度聚类算法中的应用[J]. 《山东大学学报(理学版)》, 2026, 61(5): 55-64.
[2] 孙迪,郭义童,任超,范海峰,张传雷. 基于多尺度特征融合与改进注意力的锈蚀螺栓螺帽检测[J]. 《山东大学学报(理学版)》, 2026, 61(1): 1-14.
[3] 仲尚,马丽,刘文哲,李雨豪. 融合多尺度注意力机制和改进特征融合的轻量化水面小目标检测模型[J]. 《山东大学学报(理学版)》, 2026, 61(1): 15-25.
[4] 余雷,孙懿,华金铭,李腊全. 基于深度神经网络的重症监护室脓毒症患者死亡风险预测模型分析[J]. 《山东大学学报(理学版)》, 2026, 61(1): 26-35.
[5] 王军涛,黄强. 基于一般重叠函数的模糊数学形态学边缘检测方法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 36-48.
[6] 李文焱,李丽红,王洪欣. 基于知识度量的模糊粗糙c-均值算法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 49-64.
[7] 孙清,叶军,曾广财,宋苏洋,汪一心. 结合蝙蝠算法和紧密度改进的三支K-means算法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 65-75.
[8] 邹峥,雷雨晟,刘石坚,王定一,邱学炜,史雯雯,周校通. 白蚁分区式微方向感知的精确形态识别[J]. 《山东大学学报(理学版)》, 2026, 61(1): 76-84.
[9] 梁霞,郭洁. 基于在线评论的线上教学平台选择方法[J]. 《山东大学学报(理学版)》, 2024, 59(9): 108-118.
[10] 黎超,廖薇. 基于医疗知识驱动的中文疾病文本分类模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 122-130.
[11] 纪杰,孙承杰,单丽莉,尚伯乐,林磊. 基于提示学习的电信网络诈骗案件分类方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 113-121.
[12] 罗奇,苟刚. 基于聚类和群组归一化的多模态对话情绪识别[J]. 《山东大学学报(理学版)》, 2024, 59(7): 105-112.
[13] 赵峰叙,王健,林原,林鸿飞. 面向排序学习的概率分布优化模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 95-104.
[14] 黄兴宇,赵明宇,吕子钰. 面向图神经网络表征学习的类别知识探针[J]. 《山东大学学报(理学版)》, 2024, 59(7): 85-94.
[15] 桂梁,徐遥,何世柱,张元哲,刘康,赵军. 基于动态邻居选择的知识图谱事实错误检测方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 76-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!