基于邻域粗糙集的偏标记特征选择

doi:10.6040/j.issn.1671-9352.7.2023.4204

摘要/Abstract

摘要： 基于邻域粗糙集框架提出一种针对偏标记数据的特征选择方法,构建偏标记邻域决策系统,定义偏标记学习问题中邻域粗糙集的下近似和依赖度,建立适用于偏标记分类问题的特征选择算法。该算法能够在对特征空间进行邻域粒化的同时度量候选标记集合中标记间的相似程度,选出与标记信息相关性较强的特征子集。使用了2种不同于最常用随机方法的假阳性候选标记生成机制,在实验部分对不同偏标记生成机制进行分析和对比。最后给出了在6个真实偏标记数据集和6个受控单标记数据集上的大量实验对比结果,验证了所提特征选择方法的有效性。

关键词: 偏标记学习, 特征选择, 偏标记邻域决策系统, 领域粗糙集

Abstract: A feature selection method for partial label learning based on neighborhood rough sets is proposed. A partial label neighborhood decision system is constructed, and the concepts of lower approximation and dependency of neighborhood rough sets are then defined in partial label learning. On this basis, a feature selection algorithm suitable to partial label classification is developed. This method can measure the similarity between labels in the set of candidate labels while granulating the feature space in the neighborhood, and select a subset of features with strong relevance to the label information. Two generation mechanisms for false positive candidate labels are used which are different from the most often used random method, and their impact on the results are compared and analyzed in the experiments. Finally, extensive experimental results on six real-world and six controlled synthetic partial label data sets are presented to demonstrate the effectiveness of the proposed feature selection method.

Key words: partial label learning, feature selection, partial label neighborhood decision system, neighborhood rough sets

中图分类号:

TP181

高贺飞,李艳,王硕. 基于邻域粗糙集的偏标记特征选择[J]. 《山东大学学报(理学版)》, 2024, 59(5): 100-113.

GAO Hefei, LI Yan, WANG Shuo. Feature selection for partial label learning based on neighborhood rough sets[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 100-113.

参考文献

[1] WANG Dengbao, ZHANG Minling, LI Li. Adaptive graph guided disambiguation for partial label learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12):8796-8811.
[2] NGUYEN N, CARUANA R. Classification with partial labels[C] //Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2008:551-559.
[3] HÜLLERMEIER E, BERINGER J. Learning from ambiguously labeled examples[J]. Intelligent Data Analysis, 2006, 10(5):419-439.
[4] COUR T, SAPP B, TASKAR B. Learning from partial labels[J]. The Journal of Machine Learning Research, 2011, 12(5):1501-1536.
[5] YU Fei, ZHANG Minling. Maximum margin partial label learning [J].Machine Learning, 2017, 106(4):1-21.
[6] ZHANG Minling, YU Fei. Solving the partial label learning problem: an instance-based approach[C] //International Joint Conference on Artificial Intelligence. Buenos Aires: Morgan Kaufmann, 2015:4048-4054.
[7] ZHANG Minling, WU Xuan. Disambiguation-free partial label learning[J]. Scientia Sinica Informationis, 2019, 49(9):1083-1096.
[8] WU Xuan, ZHANG MinLing. Towards enabling binary decomposition for partial label learning[C] //International Joint Conference on Artificial Intelligence. Stockholm: Morgan Kaufmann, 2018:2868-2874.
[9] ZHANG Minling, WU Jinghan, BAO Weixuan. Disambiguation enabled linear discriminant analysis for partial label dimensionality reduction[J]. ACM Transactions on Knowledge Discovery From Data, 2022, 16(4):1-18.
[10] BAO Weixuan, HANG Junyi, ZHANG Minling. Partial label dimensionality reduction via confidence-based dependence maximization[C] //Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York: ACM, 2021:46-54.
[11] LI Haikun, FANG Min, GE Lingchi, et al. Partial label dimensional reduction via semantic difference information and manifold regularization[J]. International Journal on Artificial Intelligence Tools, 2022, 31(2):1-13.
[12] BAO Weixuan, HANG Junyi, ZHANG Minling. Submodular feature selection for partial label learning[C] //Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2022:26-34.
[13] XIA Shuyin, ZHANG Zhao, LI Wenhua, et al. GBNRS: a novel rough set algorithm for fast adaptive attribute reduction in classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(3):1231-1242.
[14] WANG Changzhong, WANG Yan, SHAO Mingwen, et al. Fuzzy rough attribute reduction for categorical data[J]. IEEE Transactions on Fuzzy Systems, 2020, 28(5):818-830.
[15] WANG Changzhong, HU Qinghua, WANG Xizhao, et al. Feature selection based on neighborhood discrimination index[J]. IEEE Transactions on Neural Networks, 2018, 29(7):2986-2999.
[16] FAN Jing, JIANG Yunliang, LIU Yong. Quick attribute reduction with generalized indiscernibility models[J]. Information Sciences, 2017, 397:15-36.
[17] CAMPAGNER A, CIUCCI D, HÜLLERMEIER E. Rough set-based feature selection for weakly labeled data[J]. International Journal of Approximate Reasoning, 2021, 136(1):150-167.
[18] QIAN Wenbin, LI Yihui, YE Qianzhi, et al. Disambiguation-based partial label feature selection via feature dependency and label consistency[J]. Information Fusion, 2023, 94:152-168.
[19] 胡清华,赵辉,于达仁. 基于邻域粗糙集的符号与数值属性快速约简算法 [J]. 模式识别与人工智能,2008,21(6):732-738. HU Qinghua, ZHAO Hui, YU Daren. Efficient symbolic and numerical attribute reduction with rough sets[J]. Pattern Recognition and Artificial Intelligence, 2008, 21(6):732-738.
[20] HU Qinghua, YU Daren, LIU Jinfu, et al. Neighborhood rough set based heterogeneous feature subset selection[J]. Information Sciences, 2008, 178(18):3577-3594.
[21] PANIS G, LANITIS A. An overview of research activities in facial age estimation using the FG-NET aging database[C] //European Conference on Computer Vision. Zürich: Springer, 2015.
[22] ZENG Zinan, XIAO Shijie, JIA Kui, et al. Learning by associating maximum margin images[C] //Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013:708-715.
[23] LIU Liping, THOMAS G. DIETTERICH. A conditional multinomial mixture model for superset label learning[J]. Advances in Neural Information Processing Systems, 2012, 25:557-565.
[24] BRIGGS F, FERN X Z, RAICH R. Rank-loss support instance machines for MIML instance annotation[C] //Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012:534-542.
[25] HUISKES M J, LEW M S. The MIR flickr retrieval evaluation[C] //Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. New York: ACM, 2008:39-43.

相关文章 15

[1]	程雨轩,毛煜,张小清,曾艺祥,林耀进. 基于次相关特征和邻域互信息的在线多标记特征选择算法[J]. 《山东大学学报(理学版)》, 2024, 59(5): 70-81.
[2]	朱礼全,林耀进,毛煜,程雨轩. 基于高维相关性多标签在线流特征选择[J]. 《山东大学学报(理学版)》, 2024, 59(5): 90-99.
[3]	史春雨,毛煜,刘浩阳,林耀进. 基于样本相关性的层次特征选择算法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 61-70.
[4]	张志浩,林耀进,卢舜,吴镒潾,王晨曦. 流缺失标记环境下的多标记特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(8): 39-52.
[5]	孙林,陈雨生,徐久成. 基于改进ReliefF的多标记特征选择算法[J]. 《山东大学学报(理学版)》, 2022, 57(4): 1-11.
[6]	孙林,梁娜,徐久成. 基于自适应邻域互信息与谱聚类的特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(12): 13-24.
[7]	张要,马盈仓,杨小飞,朱恒东,杨婷. 结合流形结构与柔性嵌入的多标签特征选择[J]. 《山东大学学报(理学版)》, 2021, 56(7): 91-102.
[8]	黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报（理学版）, 2017, 52(3): 91-96.
[9]	万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报（理学版）, 2016, 51(5): 87-93.
[10]	李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报（理学版）, 2016, 51(11): 7-12.
[11]	夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报（理学版）, 2014, 49(11): 22-30.
[12]	郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报（理学版）, 2014, 49(11): 74-81.
[13]	于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[14]	易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
[15]	杨玉珍刘培玉朱振方邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed