您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2022, Vol. 57 ›› Issue (12): 13-24.doi: 10.6040/j.issn.1671-9352.7.2021.168

• • 上一篇    

基于自适应邻域互信息与谱聚类的特征选择

孙林1,2,梁娜1,徐久成1,2   

  1. 1.河南师范大学计算机与信息工程学院, 河南 新乡 453007;2.智慧商务与物联网技术河南省工程实验室, 河南 新乡 453007
  • 发布日期:2022-12-05
  • 作者简介:孙林(1979— ),男,博士,副教授,硕士生导师,研究方向为粒计算、数据挖掘、生物信息学等. E-mail:sunlin@htu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62076089,61976082);河南省科技攻关资助项目(212102210136)

Feature selection using adaptive neighborhood mutual information and spectral clustering

SUN Lin1,2, LIANG Na1, XU Jiu-cheng1,2   

  1. 1. College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, Henan, China;
    2. Henan Engineering Laboratory of Smart Business and Internet of Things Technology, Xinxiang 453007, Henan, China
  • Published:2022-12-05

摘要: 借鉴邻域粗糙集处理连续型数据的优势,为解决传统谱聚类算法需要人工选取参数的问题,提出基于自适应邻域互信息与谱聚类的特征选择算法。首先,定义各对象在属性下的标准差集合与自适应邻域集,给出自适应邻域熵、平均邻域熵、联合熵、邻域条件熵、邻域互信息等不确定性度量,利用自适应邻域互信息对特征与标签的相关性进行排序。然后,结合共享近邻自适应谱聚类算法,将相关性强的特征聚到同一特征簇内,使不同特征簇内的特征强相异。最后,使用最小冗余最大相关技术设计特征选择算法。在10个数据集上选择特征个数与分类精度的实验结果,验证了所提算法的有效性。

关键词: 特征选择, 邻域粗糙集, 邻域互信息, 谱聚类, 最小冗余最大相关

Abstract: In order to deal with the problem that traditional spectral clustering algorithms need set parameters manually, this paper proposes a feature selection algorithm based on adaptive neighborhood mutual information and spectral clustering, which takes the advantage of neighborhood rough sets to deal with continuous data. First, the standard deviation set and adaptive neighborhood set of each object on attribute are defined. Some uncertainty measures such as adaptive neighborhood entropy, average neighborhood entropy, joint entropy, neighborhood conditional entropy and neighborhood mutual information are given, and then the adaptive neighborhood mutual information is used to sort the correlation between features and labels. Second, the shared nearest neighbor spectral clustering algorithm is combined to cluster the strongly relevant features into the same feature cluster, so that the features in the different feature clusters are strongly diverse. Finally, the feature selection algorithm is designed by employing the minimum redundancy and maximum correlation technology. The experimental results of selecting the number of features and classification accuracy on ten datasets verify the effectiveness of the proposed algorithm.

Key words: feature selection, neighborhood rough set, adaptive neighborhood mutual information, spectral clustering, minimum redundancy and maximum correlation

中图分类号: 

  • TP181
[1] 景运革,景罗希,王宝丽,等. 属性值和属性变化的增量属性约简算法[J]. 山东大学学报(理学版), 2020, 55(1):62-68. JING Yunge, JING Luoxi, WANG Baoli, et al. An incremental attribute reduction approach when attribute values and attributes of the decision system change dynamically[J]. Journal of Shandong University(Natural Science), 2020, 55(1):62-68.
[2] 刘艳,程璐,孙林. 基于K-S检验和邻域粗糙集的特征选择方法[J]. 河南师范大学学报(自然科学版), 2019, 47(2):21-28. LIU Yan, CHENG Lu, SUN Lin. Feature selection method based on K-S test and neighborhood rough sets[J]. Journal of Henan Normal University(Natural Science Edition), 2019, 47(2):21-28.
[3] SUN Lin, WANG Lanying, DING Weiping, et al. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems[J]. Knowledge-Based Systems, 2020, 192:105373.
[4] 刘琨,封硕. 加强局部搜索能力的人工蜂群算法[J]. 河南师范大学学报(自然科学版), 2021, 49(2):15-24. LIU Kun, FENG Shuo. An improved artificial bee colony algorithm for enhancing local search ability[J]. Journal of Henan Normal University(Natural Science Edition), 2021, 49(2):15-24.
[5] 邓威,郭钇秀,李勇,等. 基于特征选择和Stacking集成学习的配电网网损预测[J]. 电力系统保护与控制, 2020, 48(15):108-115. DENG Wei, GUO Yixiu, LI Yong, et al. Power losses prediction based on feature selection and Stacking integrated learning[J]. Power System Protection and Control, 2020, 48(15):108-115.
[6] 薛占熬,庞文莉,姚守倩,等. 基于前景理论的直觉模糊三支决策模型[J]. 河南师范大学学报(自然科学版), 2020, 48(5):31-36. XUE Zhanao, PANG Wenli, YAO Shouqian, et al. The prospect theory based intuitionistic fuzzy three-way decisions model[J]. Journal of Henan Normal University(Natural Science Edition), 2020, 48(5):31-36.
[7] CHEN Yingyue, CHEN Yumin. Feature subset selection based on variable precision neighborhood rough sets[J]. International Journal of Computational Intelligence Systems, 2021, 14(1):572-581.
[8] YANG Xiaoling, CHEN Hongmei, LI Tianrui, et al. Neighborhood rough sets with distance metric learning for feature selection[J].Knowledge-Based Systems, 2021, 224:107076.
[9] 姚晟,徐风,赵鹏,等. 基于自适应邻域空间粗糙集模型的直觉模糊熵特征选择[J]. 计算机研究与发展, 2018, 55(4):802-814. YAO Sheng, XU Feng, ZHAO Peng, et al. Feature selection of intuitionistic fuzzy entropy based on adaptive neighborhood spatial rough set model[J]. Journal of Computer Research and Development, 2018, 55(4):802-814.
[10] 王睿,高欣,李军良,等. 基于聚类分析的电动汽车充电负荷预测方法[J]. 电力系统保护与控制, 2020, 48(16):37-44. WANG Rui, GAO Xin, LI Junliang, et al. Electric vehicle charging demand forecasting method based on clustering analysis[J]. Power System Protection and Control, 2020, 48(16):37-44.
[11] 李福东,曾旭华,魏梅芳,等. 基于聚类分析和混合自适应进化算法的短期风电功率预测[J].电力系统保护与控制, 2020, 48(22):151-158. LI Fudong, ZENG Xuhua, WEI Meifang, et al. Short-term wind power forecasting based on cluster analysis and a hybrid evolutionary-adaptive methodology[J]. Power System Protection and Control, 2020, 48(22):151-158.
[12] 赵晓晓,周治平. 结合稀疏表示与约束传递的半监督谱聚类算法[J]. 智能系统学报, 2018, 13(5):855-862. ZHAO Xiaoxiao, ZHOU Zhiping. A semi-supervised spectral clustering algorithm combined with sparse representation and constraint propagation[J]. CAAI Transactions on Intelligent Systems, 2018, 13(5):855-863.
[13] SHANG Ronghua, XU Kaiming, SHANG Fanhua, et al. Sparse and low-redundant subspace learning-based dual-graph regularized robust feature selection[J]. Knowledge-Based Systems, 2020, 187:104830.
[14] 胡敏杰,郑荔平,唐莉,等. 联合谱聚类与邻域互信息的特征选择算法[J]. 模式识别与人工智能, 2017, 30(12):1121-1129. HU Minjie, ZHENG Liping, TANG Li, et al. Feature selection algorithm based on joint spectral clustering and neighborhood mutual information [J]. Pattern Recognition and Artificial Intelligence, 2017, 30(12):1121-1129.
[15] 储德润,周治平. 公理化模糊共享近邻自适应谱聚类算法[J]. 智能系统学报, 2019, 14(5):897-904. CHU Derun, ZHOU Zhiping. Shared nearest neighbor adaptive spectral clustering algorithm based on axiomatic fuzzy set theory[J]. CAAI Transactions on Intelligent Systems, 2019, 14(5):897-904.
[16] SUN Lin, ZHANG Xiaoyu, QIAN Yuhua, et al. Joint neighborhood entropy-based gene selection method with fisher score for tumor classification[J]. Applied Intelligence, 2018, 49(4):1245-1259.
[17] 林芷欣,刘遵仁,纪俊. 基于k近邻属性重要度和相关系数的属性约简[J]. 计算机工程与设计, 2020, 41(9):2488-2494. LIN Zhixin, LIU Zunren, JI Jun. Attribute reduction based on k nearest neighbor attribute importance and correlation coefficient[J]. Computer Engineering and Design, 2020, 41(9):2488-2494.
[18] LIU Yong, HUANG Wenliang, JIANG Yunliang, et al. Quick attribute reduct algorithm for neighborhood rough set model[J]. Information Sciences, 2014, 271(7):65-81.
[19] 林芷欣,刘遵仁,纪俊. 基于Relief属性重要度的快速约简算法[J]. 青岛大学学报(自然科学版), 2019, 32(3):8-13. LIN Zhixin, LIU Zunren, JI Jun. Fast reduction algorithm based on relief attribute importance[J]. Journal of Qingdao University(Natural Science Edition), 2019, 32(3):8-13.
[20] SUN Lin, WANG Lanying, Qian Yuhua, et al. Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems[J]. Knowledge-Based Systems, 2019, 186:104942.
[21] CHEN Degang, ZHANG Lei, ZHAO Suyun, et al. A novel algorithm for finding reducts with fuzzy rough sets[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(2):385-389.
[22] QIAN Yuhua, WANG Qi, CHENG Honghong, et al. Fuzzy-rough feature selection accelerator[J]. Fuzzy Sets and Systems, 2015, 258(1):61-78.
[23] JENSEN R, SHEN Q. New approaches to fuzzy-rough feature selection[J]. IEEE Transactions on Fuzzy Systems, 2009, 17(4):824-838.
[24] TAN Anhui, WU Weizhi, QIAN Yuhua, et al. Intuitionistic fuzzy rough set-based granular structures and attribute subset selection[J]. IEEE Transactions on Fuzzy Systems, 2019, 27(3):527-539.
[25] 姚晟,徐风,赵鹏,等. 基于改进邻域粒的模糊熵特征选择算法[J]. 南京大学学报(自然科学), 2017, 53(4):802-814. YAO Sheng, XU Feng, ZHAO Peng, et al. Fuzzy entropy feature selection algorithm based on improved neighborhood granules[J]. Journal of Nanjing University(Natural Sciences), 2017, 53(4):802-814.
[26] CHEN Yumin, WU Keshou, CHEN Xuhui, et al. An entropy-based uncertainty measurement approach in neighborhood systems[J]. Information Sciences, 2014, 279:239-250.
[27] JIANG Feng, SUI Yunfei, ZHOU Lin, et al. A relative decision entropy-based feature selection approach[J]. Pattern Recognition: The Journal of the Pattern Recognition Society, 2015, 48(7):2151-2163.
[28] WANG Changzhong, SHAO Mingwen, HE Qiang, et al. Feature subset selection based on fuzzy neighborhood rough sets[J]. Knowledge-Based Systems, 2016, 111(1):173-179.
[29] ZHU Pengfei, HU Qinghua. Adaptive neighborhood granularity selection and combination based on margin distribution optimization[J]. Information Sciences, 2013, 249:1-12.
[30] ZHAO Hong, WANG Ping, HU Qinghua. Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence[J]. Information Sciences, 2016, 366:134-149.
[1] 张志浩,林耀进,卢舜,吴镒潾,王晨曦. 流缺失标记环境下的多标记特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(8): 39-52.
[2] 孙林,陈雨生,徐久成. 基于改进ReliefF的多标记特征选择算法[J]. 《山东大学学报(理学版)》, 2022, 57(4): 1-11.
[3] 张要,马盈仓,杨小飞,朱恒东,杨婷. 结合流形结构与柔性嵌入的多标签特征选择[J]. 《山东大学学报(理学版)》, 2021, 56(7): 91-102.
[4] 杨婷,朱恒东,马盈仓,汪义瑞,杨小飞. 基于L2,1范数和流形正则项的半监督谱聚类算法[J]. 《山东大学学报(理学版)》, 2021, 56(3): 67-76.
[5] 黄天意,祝峰. 基于流形学习的代价敏感特征选择[J]. 山东大学学报(理学版), 2017, 52(3): 91-96.
[6] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[7] 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12.
[8] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[9] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[10] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[11] 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
[12] 杨玉珍 刘培玉 朱振方 邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51.
[13] 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 .
[14] 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!