《山东大学学报(理学版)》 ›› 2020, Vol. 55 ›› Issue (3): 58-69.doi: 10.6040/j.issn.1671-9352.1.2019.154
摘要:
提出了一种多标签符号型属性值划分的聚类方法(clustering method for multi-label symbolic value partition, CMSVP)。首先,利用标签排序和K-means算法,对原始标签信息进行聚类。然后,为每一个属性构建一个无向加权图。图中的每一个节点代表一个属性值,边的权重则表示节点之间的相似度。最后,对所有的无向加权图进行随机游走,得到属性值的聚类方案。实验在6个多标签数据集上进行。结果表明,CMSVP算法在对数据有效地进行压缩的同时,也能在一定程度上提高数据的分类性能。
中图分类号:
1 | HVLLERMEIER E , FVRNKRANZ J , CHENG Weiwei , et al. Label ranking by learning pairwise preferences[J]. Artificial Intelligence, 2008, 172 (16/17): 1897- 1916. |
2 |
FVRNKRANZ J , HVLLERMEIER E , MENCÍA EL , et al. Multilabel classification via calibrated label ranking[J]. Machine Learning, 2008, 73 (2): 133- 153.
doi: 10.1007/s10994-008-5064-8 |
3 | WANG Jiang, YANG Yi, MAO Junhua, et al. Cnn-rnn: a unified framework for multi-label image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas: IEEE, 2016: 2285-2294. |
4 | WANG Mei, ZHOU Xiangdong, CHUA T S. Automatic image annotation via local multi-label classification[C]//Proceedings of the 2008 International Conference on Content-based Image and Video Retrieval. Niagara Falls: ACM, 2008: 17-26. |
5 |
YU Ying , PEDRYCZ W , MIAO Duoqian . Neighborhood rough sets based multi-label classification for automatic image annotation[J]. International Journal of Approximate Reasoning, 2013, 54 (9): 1373- 1387.
doi: 10.1016/j.ijar.2013.06.003 |
6 | CHEN Weizhu, YAN Jun, ZHANG Benyu, et al. Document transformation for multi-label feature selection in text categorization[C]//Seventh IEEE International Conference on Data Mining. Omaha, IEEE, 2007: 451-456. |
7 |
SCHAPIRE R E , SINGER Y . BoosTexter: a boosting-based system for text categorization[J]. Machine Learning, 2000, 39 (2/3): 135- 168.
doi: 10.1023/A:1007649029923 |
8 | UEDA N, SAITO K. Parametric mixture models for multi-labeled text[C]//Advances in Neural Information Processing Systems. British Columbia: MIT Press, 2003: 737-744. |
9 | 张志飞, 苗夺谦. 基于粗糙集的文本分类特征选择算法[J]. 智能系统学报, 2009, 4 (5): 453- 457. |
ZHANG Zhifei , MIAO Duoqian . Feature selection for text categorization based on rough set[J]. CAAI Transactions on Intelligent Systems, 2009, 4 (5): 453- 457. | |
10 | CLARE A , KING R D . Knowledge discovery in multi-label phenotype data[J]. Lecture Notes in Computer Science, 2001, 2168 (2168): 42- 53. |
11 |
高娟, 王国胤, 胡峰. 多类别肿瘤基因表达谱的自动特征选择方法[J]. 计算机科学, 2012, 39 (10): 193- 197.
doi: 10.3969/j.issn.1002-137X.2012.10.043 |
GAO Juan , WANG Guoyin , HU Feng . Auto-selection of informative gene for multi-class tumor gene expression profiles[J]. Computer Science, 2012, 39 (10): 193- 197.
doi: 10.3969/j.issn.1002-137X.2012.10.043 |
|
12 |
SAEYS Y . A review of feature selection techniques in bioinformatics[J]. Bioinformatics, 2007, 23 (19): 2507- 2517.
doi: 10.1093/bioinformatics/btm344 |
13 |
MIN Fan , LIU Qihe , FANG Chunlan . Rough sets approach to symbolic value partition[J]. International Journal of Approximate Reasoning, 2008, 49 (3): 689- 700.
doi: 10.1016/j.ijar.2008.07.002 |
14 |
秦奇伟, 梁吉业, 钱宇华. 一种基于邻域距离的聚类特征选择方法[J]. 计算机科学, 2012, 39 (1): 175- 177.
doi: 10.3969/j.issn.1002-137X.2012.01.040 |
QIN Qiwei , LIANG Jiye , QIAN Yuhua . Clustering feature selection method based on neighborhood distance[J]. Computer Science, 2012, 39 (1): 175- 177.
doi: 10.3969/j.issn.1002-137X.2012.01.040 |
|
15 | 段洁, 胡清华, 张灵均, 等. 基于邻域粗糙集的多标记分类特征选择算法[J]. 计算机研究与发展, 2015, 52 (1): 56- 65. |
DUAN Jie , HU Qinghua , ZHANG Lingjun , et al. Feature selection for multi-label classification based on neighborhood rough sets[J]. Journal of Computer Research and Development, 2015, 52 (1): 56- 65. | |
16 |
严莉莉, 张燕平. 基于类信息的文本聚类中特征选择算法[J]. 计算机工程与应用, 2007, 43 (12): 144- 146.
doi: 10.3321/j.issn:1002-8331.2007.12.046 |
YAN Lili , ZHANG Yanping . A class-based feature selection algorithm for test clustering[J]. Computer Engineering and Applications, 2007, 43 (12): 144- 146.
doi: 10.3321/j.issn:1002-8331.2007.12.046 |
|
17 |
ROKACH L , SCHCLAR A , ITACH E . Ensemble methods for multi-label classification[J]. Expert Systems with Applications, 2014, 41 (16): 7507- 7523.
doi: 10.1016/j.eswa.2014.06.015 |
18 |
SPOLAÔR N , CHERMAN E A , MONARD M C , et al. A comparison of multi-label feature selection methods using the problem transformation approach[J]. Electronic Notes in Theoretical Computer Science, 2013, 292: 135- 151.
doi: 10.1016/j.entcs.2013.02.010 |
19 |
ZHANG Minling , PEÑA J M , ROBLES V . Feature selection for multi-label naive Bayes classification[J]. Information Sciences, 2009, 179 (19): 3218- 3229.
doi: 10.1016/j.ins.2009.06.010 |
20 |
CAI Zhiling , ZHU William . Feature selection for multi-label classification using neighborhood preservation[J]. IEEE/CAA Journal of Automatica Sinica, 2018, 5 (1): 320- 330.
doi: 10.1109/JAS.2017.7510781 |
21 | KERBER R.ChiMerge: discretization of numeric attributes[C]// Proceedings of the 10th National Conference on Artificial Intelligence. San Jose: AAAI, 1992: 12-16. |
22 | WEN Liuying , MIN Fan , WANG Shiyuan . A two-stage discretization algorithm based on information entropy[J]. Applied Intelligence, 2017, 47 (4): 1169- 1185. |
23 | NGUYEN H S. Discretization of real value attributes, boolean reasoning approach[D]. Warsaw: Warsaw University, 1997. |
24 | WEN Liuying , MIN Fan . A granular computing approach to symbolic value partitioning[J]. Fundamenta Informaticae, 2015, 142 (1/2/3/4): 337- 371. |
25 | HAREL D, KOREN Y. On clustering using random walks[C]//International Conference on Foundations of Software Technology and Theoretical Computer Science. Heidelberg: Springer, 2001: 18-41. |
26 | MIN Fan , HU Qinghua , ZHU William . Feature selection with test cost constraint[J]. International Journal of Approximate Reasoning, 2014, 55 (1): 167- 179. |
[1] | 唐益明,张征,芦启明. 分段二次方转换函数驱动的高斯核模糊C均值聚类[J]. 《山东大学学报(理学版)》, 2020, 55(3): 107-112. |
[2] | 卢政宇,李光松,申莹珠,张彬. 基于连续特征的未知协议消息聚类算法[J]. 《山东大学学报(理学版)》, 2019, 54(5): 37-43. |
[3] | 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45. |
[4] | 黄栋,徐博,许侃,林鸿飞,杨志豪. 基于词向量和EMD距离的短文本聚类[J]. 山东大学学报(理学版), 2017, 52(7): 66-72. |
[5] | 许忠好,李天奇. 基于复杂网络的中国股票市场统计特征分析[J]. 山东大学学报(理学版), 2017, 52(5): 41-48. |
[6] | 高盛祥,余正涛,秦雨,程韵如,庙介璞. 基于随机游走策略的专家关系网络构建[J]. 山东大学学报(理学版), 2016, 51(7): 30-34. |
[7] | 翟鹏,李登道. 基于高斯隶属度的包容性指标模糊聚类算法[J]. 山东大学学报(理学版), 2016, 51(5): 102-105. |
[8] | 刘颖莹,刘培玉,王智昊,李情情,朱振方. 一种基于密度峰值发现的文本聚类算法[J]. 山东大学学报(理学版), 2016, 51(1): 65-70. |
[9] | 范意兴, 郭岩, 李希鹏, 赵岭, 刘悦, 俞晓明, 程学旗. 一种基于网页块特征的多级网页聚类方法[J]. 山东大学学报(理学版), 2015, 50(07): 1-8. |
[10] | 祝瑞. 一种基于信任度的电子商务社区聚类模型[J]. 山东大学学报(理学版), 2015, 50(05): 18-22. |
[11] | 焦潞林, 彭岩, 林云. 面向网络舆情的文本知识发现算法对比研究[J]. 山东大学学报(理学版), 2014, 49(09): 62-68. |
[12] | 张聪, 于洪. 一种三支决策软增量聚类算法[J]. 山东大学学报(理学版), 2014, 49(08): 40-47. |
[13] | 万润泽1,雷建军1,袁操2. 基于模糊聚类理论的无线传感器节点休眠优化策略[J]. J4, 2013, 48(09): 17-21. |
[14] | 杜世强1,石玉清2,王维兰1,马明1. 基于流形正则化判别的因子分解[J]. J4, 2013, 48(05): 63-69. |
[15] | 赵晶,马勤,崔玉泉. 宏观经济区划比较研究:双聚类算法的应用[J]. J4, 2012, 47(9): 71-77. |
|