您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (5): 100-113.doi: 10.6040/j.issn.1671-9352.7.2023.4204

• • 上一篇    下一篇

基于邻域粗糙集的偏标记特征选择

高贺飞1,李艳2*,王硕1   

  1. 1.河北大学数学与信息科学学院, 河北 保定 071002;2.北京师范大学珠海分校应用数学学院, 广东 珠海 519000
  • 发布日期:2024-05-09
  • 通讯作者: 李艳(1976— ),女,教授,硕士生导师,博士,研究方向为粗糙集与粒计算和机器学习. E-mail:39826980@qq.com
  • 基金资助:
    国家自然科学基金资助项目(61976141);河北省自然科学基金资助项目(F2021201055)

Feature selection for partial label learning based on neighborhood rough sets

GAO Hefei1, LI Yan2*, WANG Shuo1   

  1. 1. College of Mathematics and Information Science, Hebei University, Baoding 071002, Hebei, China;
    2. School of Applied Mathematics, Beijing Normal University at Zhuhai, Zhuhai 519000, Guangdong, China
  • Published:2024-05-09

摘要: 基于邻域粗糙集框架提出一种针对偏标记数据的特征选择方法,构建偏标记邻域决策系统,定义偏标记学习问题中邻域粗糙集的下近似和依赖度,建立适用于偏标记分类问题的特征选择算法。该算法能够在对特征空间进行邻域粒化的同时度量候选标记集合中标记间的相似程度,选出与标记信息相关性较强的特征子集。使用了2种不同于最常用随机方法的假阳性候选标记生成机制,在实验部分对不同偏标记生成机制进行分析和对比。最后给出了在6个真实偏标记数据集和6个受控单标记数据集上的大量实验对比结果,验证了所提特征选择方法的有效性。

关键词: 偏标记学习, 特征选择, 偏标记邻域决策系统, 领域粗糙集

Abstract: A feature selection method for partial label learning based on neighborhood rough sets is proposed. A partial label neighborhood decision system is constructed, and the concepts of lower approximation and dependency of neighborhood rough sets are then defined in partial label learning. On this basis, a feature selection algorithm suitable to partial label classification is developed. This method can measure the similarity between labels in the set of candidate labels while granulating the feature space in the neighborhood, and select a subset of features with strong relevance to the label information. Two generation mechanisms for false positive candidate labels are used which are different from the most often used random method, and their impact on the results are compared and analyzed in the experiments. Finally, extensive experimental results on six real-world and six controlled synthetic partial label data sets are presented to demonstrate the effectiveness of the proposed feature selection method.

Key words: partial label learning, feature selection, partial label neighborhood decision system, neighborhood rough sets

中图分类号: 

  • TP181
[1] WANG Dengbao, ZHANG Minling, LI Li. Adaptive graph guided disambiguation for partial label learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12):8796-8811.
[2] NGUYEN N, CARUANA R. Classification with partial labels[C] //Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2008:551-559.
[3] HÜLLERMEIER E, BERINGER J. Learning from ambiguously labeled examples[J]. Intelligent Data Analysis, 2006, 10(5):419-439.
[4] COUR T, SAPP B, TASKAR B. Learning from partial labels[J]. The Journal of Machine Learning Research, 2011, 12(5):1501-1536.
[5] YU Fei, ZHANG Minling. Maximum margin partial label learning [J].Machine Learning, 2017, 106(4):1-21.
[6] ZHANG Minling, YU Fei. Solving the partial label learning problem: an instance-based approach[C] //International Joint Conference on Artificial Intelligence. Buenos Aires: Morgan Kaufmann, 2015:4048-4054.
[7] ZHANG Minling, WU Xuan. Disambiguation-free partial label learning[J]. Scientia Sinica Informationis, 2019, 49(9):1083-1096.
[8] WU Xuan, ZHANG MinLing. Towards enabling binary decomposition for partial label learning[C] //International Joint Conference on Artificial Intelligence. Stockholm: Morgan Kaufmann, 2018:2868-2874.
[9] ZHANG Minling, WU Jinghan, BAO Weixuan. Disambiguation enabled linear discriminant analysis for partial label dimensionality reduction[J]. ACM Transactions on Knowledge Discovery From Data, 2022, 16(4):1-18.
[10] BAO Weixuan, HANG Junyi, ZHANG Minling. Partial label dimensionality reduction via confidence-based dependence maximization[C] //Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York: ACM, 2021:46-54.
[11] LI Haikun, FANG Min, GE Lingchi, et al. Partial label dimensional reduction via semantic difference information and manifold regularization[J]. International Journal on Artificial Intelligence Tools, 2022, 31(2):1-13.
[12] BAO Weixuan, HANG Junyi, ZHANG Minling. Submodular feature selection for partial label learning[C] //Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2022:26-34.
[13] XIA Shuyin, ZHANG Zhao, LI Wenhua, et al. GBNRS: a novel rough set algorithm for fast adaptive attribute reduction in classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(3):1231-1242.
[14] WANG Changzhong, WANG Yan, SHAO Mingwen, et al. Fuzzy rough attribute reduction for categorical data[J]. IEEE Transactions on Fuzzy Systems, 2020, 28(5):818-830.
[15] WANG Changzhong, HU Qinghua, WANG Xizhao, et al. Feature selection based on neighborhood discrimination index[J]. IEEE Transactions on Neural Networks, 2018, 29(7):2986-2999.
[16] FAN Jing, JIANG Yunliang, LIU Yong. Quick attribute reduction with generalized indiscernibility models[J]. Information Sciences, 2017, 397:15-36.
[17] CAMPAGNER A, CIUCCI D, HÜLLERMEIER E. Rough set-based feature selection for weakly labeled data[J]. International Journal of Approximate Reasoning, 2021, 136(1):150-167.
[18] QIAN Wenbin, LI Yihui, YE Qianzhi, et al. Disambiguation-based partial label feature selection via feature dependency and label consistency[J]. Information Fusion, 2023, 94:152-168.
[19] 胡清华,赵辉,于达仁. 基于邻域粗糙集的符号与数值属性快速约简算法 [J]. 模式识别与人工智能,2008,21(6):732-738. HU Qinghua, ZHAO Hui, YU Daren. Efficient symbolic and numerical attribute reduction with rough sets[J]. Pattern Recognition and Artificial Intelligence, 2008, 21(6):732-738.
[20] HU Qinghua, YU Daren, LIU Jinfu, et al. Neighborhood rough set based heterogeneous feature subset selection[J]. Information Sciences, 2008, 178(18):3577-3594.
[21] PANIS G, LANITIS A. An overview of research activities in facial age estimation using the FG-NET aging database[C] //European Conference on Computer Vision. Zürich: Springer, 2015.
[22] ZENG Zinan, XIAO Shijie, JIA Kui, et al. Learning by associating maximum margin images[C] //Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013:708-715.
[23] LIU Liping, THOMAS G. DIETTERICH. A conditional multinomial mixture model for superset label learning[J]. Advances in Neural Information Processing Systems, 2012, 25:557-565.
[24] BRIGGS F, FERN X Z, RAICH R. Rank-loss support instance machines for MIML instance annotation[C] //Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012:534-542.
[25] HUISKES M J, LEW M S. The MIR flickr retrieval evaluation[C] //Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. New York: ACM, 2008:39-43.
[1] 程雨轩,毛煜,张小清,曾艺祥,林耀进. 基于次相关特征和邻域互信息的在线多标记特征选择算法[J]. 《山东大学学报(理学版)》, 2024, 59(5): 70-81.
[2] 朱礼全,林耀进,毛煜,程雨轩. 基于高维相关性多标签在线流特征选择[J]. 《山东大学学报(理学版)》, 2024, 59(5): 90-99.
[3] 史春雨,毛煜,刘浩阳,林耀进. 基于样本相关性的层次特征选择算法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 61-70.
[4] 张志浩,林耀进,卢舜,吴镒潾,王晨曦. 流缺失标记环境下的多标记特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(8): 39-52.
[5] 孙林,陈雨生,徐久成. 基于改进ReliefF的多标记特征选择算法[J]. 《山东大学学报(理学版)》, 2022, 57(4): 1-11.
[6] 孙林,梁娜,徐久成. 基于自适应邻域互信息与谱聚类的特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(12): 13-24.
[7] 张要,马盈仓,杨小飞,朱恒东,杨婷. 结合流形结构与柔性嵌入的多标签特征选择[J]. 《山东大学学报(理学版)》, 2021, 56(7): 91-102.
[8] 黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报(理学版), 2017, 52(3): 91-96.
[9] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[10] 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12.
[11] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[12] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[13] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[14] 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
[15] 杨玉珍 刘培玉 朱振方 邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王刚,许信顺*. 一种新的基于多示例学习的场景分类方法[J]. J4, 2010, 45(7): 108 -113 .
[2] 陆玮洁,主沉浮,宋 翠,杨艳丽 . 中药郁金中无机离子的毛细管电泳法测定[J]. J4, 2007, 42(7): 13 -18 .
[3] 李慧娟,尹海英,张学成,杨爱芳* . 转蔗糖: 蔗糖-1-果糖基转移酶基因提高烟草的耐旱性[J]. J4, 2007, 42(1): 89 -94 .
[4] 许传轲 陈月辉 赵亚欧. 基于改进伪氨基酸组成的蛋白质相互作用预测[J]. J4, 2009, 44(9): 17 -21 .
[5] 赵君1,赵晶2,樊廷俊1*,袁文鹏1,3,张铮1,丛日山1. 水溶性海星皂苷的分离纯化及其抗肿瘤活性研究[J]. J4, 2013, 48(1): 30 -35 .
[6] 杨永伟1,2,贺鹏飞2,李毅君2,3. BL-代数的严格滤子[J]. 山东大学学报(理学版), 2014, 49(03): 63 -67 .
[7] 韩亚飞,伊文慧,王文波,王延平,王华田*. 基于高通量测序技术的连作杨树人工林土壤细菌多样性研究[J]. 山东大学学报(理学版), 2014, 49(05): 1 -6 .
[8] 陈顺民1,2 ,陈贵云3 . 非正规子群的共轭类类数为3的paqbrc阶群[J]. J4, 2009, 44(4): 5 -7 .
[9] 杨哲 . 泛函形式的超前时间为定值的倒向随机微分超前方程[J]. J4, 2006, 41(5): 39 -43 .
[10] 谢娟英1, 2,张琰1,谢维信2, 3,高新波2. 一种新的密度加权粗糙K-均值聚类算法[J]. J4, 2010, 45(7): 1 -6 .