  1. 1.河北大学数学与信息科学学院, 河北 保定 071002;2.北京师范大学珠海分校应用数学学院, 广东 珠海 519000
  • 发布日期:2024-05-09
  • 通讯作者: 李艳(1976— ),女,教授,硕士生导师,博士,研究方向为粗糙集与粒计算和机器学习. E-mail:39826980@qq.com
  • 基金资助:

Feature selection for partial label learning based on neighborhood rough sets

GAO Hefei1, LI Yan2*, WANG Shuo1   

  1. 1. College of Mathematics and Information Science, Hebei University, Baoding 071002, Hebei, China;
    2. School of Applied Mathematics, Beijing Normal University at Zhuhai, Zhuhai 519000, Guangdong, China
  • Published:2024-05-09

摘要: 基于邻域粗糙集框架提出一种针对偏标记数据的特征选择方法,构建偏标记邻域决策系统,定义偏标记学习问题中邻域粗糙集的下近似和依赖度,建立适用于偏标记分类问题的特征选择算法。该算法能够在对特征空间进行邻域粒化的同时度量候选标记集合中标记间的相似程度,选出与标记信息相关性较强的特征子集。使用了2种不同于最常用随机方法的假阳性候选标记生成机制,在实验部分对不同偏标记生成机制进行分析和对比。最后给出了在6个真实偏标记数据集和6个受控单标记数据集上的大量实验对比结果,验证了所提特征选择方法的有效性。

关键词: 偏标记学习, 特征选择, 偏标记邻域决策系统, 领域粗糙集

Abstract: A feature selection method for partial label learning based on neighborhood rough sets is proposed. A partial label neighborhood decision system is constructed, and the concepts of lower approximation and dependency of neighborhood rough sets are then defined in partial label learning. On this basis, a feature selection algorithm suitable to partial label classification is developed. This method can measure the similarity between labels in the set of candidate labels while granulating the feature space in the neighborhood, and select a subset of features with strong relevance to the label information. Two generation mechanisms for false positive candidate labels are used which are different from the most often used random method, and their impact on the results are compared and analyzed in the experiments. Finally, extensive experimental results on six real-world and six controlled synthetic partial label data sets are presented to demonstrate the effectiveness of the proposed feature selection method.

Key words: partial label learning, feature selection, partial label neighborhood decision system, neighborhood rough sets


