您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2022, Vol. 57 ›› Issue (8): 39-52.doi: 10.6040/j.issn.1671-9352.7.2021.141

• • 上一篇    下一篇

流缺失标记环境下的多标记特征选择

张志浩1,2,林耀进1,2*,卢舜1,2,吴镒潾1,2,王晨曦1,2   

  1. 1.闽南师范大学计算机学院, 福建 漳州 363000;2.数据科学与智能应用福建省高校重点实验室, 福建 漳州 363000
  • 出版日期:2022-08-20 发布日期:2022-06-29
  • 作者简介:张志浩(1996— ),男,硕士研究生,研究方向为数据挖掘. E-mail:313905019@qq.com*通信作者简介:林耀进(1980— ),男,博士,教授,硕士生导师,研究方向为数据挖掘与机器学习. E-mail:zzlinyaojin@163.com
  • 基金资助:
    国家自然科学基金资助项目(62076116);福建省自然科学基金资助项目(2020J01811,2020J01792)

Multi-label feature selection with streaming and missing labels

ZHANG Zhi-hao1,2, LIN Yao-jin1,2*, LU Shun1,2, WU Yi-lin1,2, WANG Chen-xi1,2   

  1. 1. School of Computer Science, Minnan Normal University, Zhangzhou 363000, Fujian, China;
    2. Key Laboratory of Data Science and Intelligence Application, Zhangzhou 363000, Fujian, China
  • Online:2022-08-20 Published:2022-06-29

摘要: 在监督学习实际任务中,特征的高维性、标记的动态性和缺失性为监督学习带来严峻的挑战。为解决这些不足,提出流缺失标记环境下的多标记特征选择算法。首先,为解决缺失标记的影响,通过学习标记相关性填补不完整的标记矩阵。其次,利用稀疏学习方法为每个新到达的标记选择类属属性。然后,根据已到达标记的类属属性,通过计算得分选取一个有代表性的特征子集。最后,在11个基准数据集上进行一系列实验表明,所提算法能选择有代表性的特征子集,且分类性能较优。

关键词: 多标记学习, 特征选择, 类属属性, 缺失标记, 流标记

Abstract: In the practical tasks of supervised learning, the high dimensionality of feature space, the dynamic and missing of labels bring severe challenges to supervised learning. To address these problems, a multi-label feature selection with streaming and missing labels algorithm is proposed. Firstly, to solve the impact of missing labels, the missing matrix is completed by learning label correlations. Secondly, sparse learning is utilized to select label-specific features for each newly arrived label. Then, a representative feature subset is selected by calculating the score of each label-specific features of label. Finally, a series of experiments on 11 benchmark data sets demonstrate that the proposed algorithm can effectively select a representative feature subset with better classification performance.

Key words: multi-label learning, feature selection, label-specific feature, missing label, streaming label

中图分类号: 

  • TP181
[1] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on neighborhood mutual information[J]. Applied Soft Computing, 2015, 38(1):224-256.
[2] 王晨曦, 林耀进, 唐莉, 等. 基于信息粒化的多标记特征选择算法[J]. 模式识别与人工智能, 2018, 31(2):123-131. WANG Chenxi, LIN Yaojin, TANG Li, et al. Multi-label feature selection based on information granulation[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(2):123-131.
[3] 林耀进, 陈祥焰, 白盛兴, 等. 基于最大决策边界的高维类不平衡数据在线流特征选择[J]. 模式识别与人工智能, 2020, 33(9):820-829. LIN Yaojin, CHEN Xiangyan, BAI Shengxing, et al. Online streaming feature selection for high-dimensional and class-imbalanced data based on max-decision boundary[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(9):820-829.
[4] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Streaming feature selection for multilabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6):1491-1507.
[5] LIU Jinghua, LIN Yaojin, WU Shunxiang, et al. Online multi-label group feature selection[J]. Knowledge-Based Systems, 2018, 143:42-57.
[6] ZHANG M L, PENA J M, ROBLES V. Feature selection for multi-label naive Bayes classification[J]. Information Sciences, 2009, 179(19):3218-3229.
[7] GHARROUDI O, ELGHAZEL H, AUSSEM A. A comparison of multi-label feature selection methods using the random forest paradigm[C] //Proceedings of the 2014 Canadian Conference on Artificial Intelligence. Montreal: Springer, 2014:95-106.
[8] ZHANG Jia, LI Candong, CAO Donglin, et al. Multi-label learning with label-specific features by resolving label correlations[J]. Knowledge-Based Systems, 2018, 159:148-157.
[9] ZHU Pengfei, XU Qian, HU Qinghua, et al. Multi-label feature selection with missing labels[J]. Pattern Recognition, 2018, 74:488-502.
[10] HUANG Jun, QIN Feng, ZHENG Xiao, et al. Improving multi-label classification with missing labels by learning label-specific features[J]. Information Sciences, 2019, 492:124-146.
[11] LIU Jinghua, LI Yuwen, WENG Wei, et al. Feature selection for multi-label learning with streaming label[J]. Neurocomputing, 2020, 387:268-278.
[12] LIN Yaojin, HU Qinghua, ZHANG Jia, et al. Multi-label feature selection with streaming labels[J]. Information Science, 2016, 372(1):256-275.
[13] XU Qian, ZHU Pengfei, HU Qinghua, et al. Robust multi-label feature selection with missing labels[C] //The 7th Chinese Conference on Pattern Recognition. Chengdu: Springer, 2016:752-765.
[14] MA J H, CHOW T. Label-specific feature selection and two-level label recovery for multi-label classification with missing labels[J]. Neural Networks, 2019, 118:110-126.
[15] WANG Chenxi, LIN Yaojin, LIU Jinghua. Feature selection for multi-label learning with missing labels[J]. Applied Intelligence, 2019, 49(8):3027-3042.
[16] BECK A, TEBOULLE M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM Journal on Imaging Sciences, 2009, 2(1):183-202.
[17] ZHANG Minling, ZHOU Zhihua. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8):1819-1837.
[18] ZHANG Minling, ZHOU Zhihua. ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7):2038-2048.
[19] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on max-dependency and min-redundancy[J]. Neurocomputing, 2015, 168:92-103.
[1] 孙林,陈雨生,徐久成. 基于改进ReliefF的多标记特征选择算法[J]. 《山东大学学报(理学版)》, 2022, 57(4): 1-11.
[2] 张要,马盈仓,杨小飞,朱恒东,杨婷. 结合流形结构与柔性嵌入的多标签特征选择[J]. 《山东大学学报(理学版)》, 2021, 56(7): 91-102.
[3] 余鹰,吴新念,王乐为,张应龙. 基于标记相关性的多标记三支分类算法[J]. 《山东大学学报(理学版)》, 2020, 55(3): 81-88.
[4] 黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报(理学版), 2017, 52(3): 91-96.
[5] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[6] 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12.
[7] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[8] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[9] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[10] 冯新营1,2,计华1,2,张化祥1,2. 基于聚类优化的RBF神经网络多标记学习算法[J]. J4, 2012, 47(5): 63-67.
[11] 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
[12] 杨玉珍 刘培玉 朱振方 邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51.
[13] 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 .
[14] 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 杨军. 金属基纳米材料表征和纳米结构调控[J]. 山东大学学报(理学版), 2013, 48(1): 1 -22 .
[2] 何海伦, 陈秀兰*. 变性剂和缓冲系统对适冷蛋白酶MCP-01和中温蛋白酶BP-01构象影响的圆二色光谱分析何海伦, 陈秀兰*[J]. 山东大学学报(理学版), 2013, 48(1): 23 -29 .
[3] 赵君1,赵晶2,樊廷俊1*,袁文鹏1,3,张铮1,丛日山1. 水溶性海星皂苷的分离纯化及其抗肿瘤活性研究[J]. J4, 2013, 48(1): 30 -35 .
[4] 孙小婷1,靳岚2*. DOSY在寡糖混合物分析中的应用[J]. J4, 2013, 48(1): 43 -45 .
[5] 罗斯特,卢丽倩,崔若飞,周伟伟,李增勇*. Monte-Carlo仿真酒精特征波长光子在皮肤中的传输规律及光纤探头设计[J]. J4, 2013, 48(1): 46 -50 .
[6] 杨伦,徐正刚,王慧*,陈其美,陈伟,胡艳霞,石元,祝洪磊,曾勇庆*. RNA干扰沉默PID1基因在C2C12细胞中表达的研究[J]. J4, 2013, 48(1): 36 -42 .
[7] 冒爱琴1, 2, 杨明君2, 3, 俞海云2, 张品1, 潘仁明1*. 五氟乙烷灭火剂高温热解机理研究[J]. J4, 2013, 48(1): 51 -55 .
[8] 杨莹,江龙*,索新丽. 容度空间上保费泛函的Choquet积分表示及相关性质[J]. J4, 2013, 48(1): 78 -82 .
[9] 李永明1, 丁立旺2. PA误差下半参数回归模型估计的r-阶矩相合[J]. J4, 2013, 48(1): 83 -88 .
[10] 董伟伟. 一种具有独立子系统的决策单元DEA排序新方法[J]. J4, 2013, 48(1): 89 -92 .