基于属性加权的ML-KNN方法

doi:10.6040/j.issn.1671-9352.2.2023.027

Abstract

Abstract:

A ML-KNN method based on attribute weighting has been proposed. To be specific, we first identify samples from the non-positive regions of decision classes by means of the variable precision neighborhood rough set model with respect to each label and construct the heterogeneous sample pairs. Then, the significance of different attributes for classification is evaluated based on their discernibility for the heterogeneous sample pairs. Finally, the weighted distances between samples are calculated in order to obtain the nearest neighbor distributions of samples. At the same time, based on the principle of maximizing the posterior probability, the multi-label classification is implemented. Further, the experimental results on ten public multi-label data sets verify the effectiveness of the proposed method.

Key words: multi-label classification, attribute significance, neighborhood rough set, uncertainty of classification, heterogeneous sample pair

CLC Number:

TP391

Xin WEN,Deyu LI. The ML-KNN method based on attribute weighting[J].JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 107-117.

Figures/Tables 14

Table 1

Table 2

Fig.1

Table 3

Table 4

Table 5

Table 6

Table 7

Table 8

Table 9

Table 10

Table 11

Table 12

Fig.2

References 23

1	YU Ying , PEDRYCZ W , MIAO Duoqian . Multi-label classification by exploiting label correlations[J]. Expert Systems with Applications, 2014, 41 (6): 2989- 3004. doi: 10.1016/j.eswa.2013.10.030
2	TSOUMAKAS G , KATAKIS I . Multi-label classification: an overview[J]. International Journal of Data Warehousing and Mining, 2007, 3 (3): 1- 13. doi: 10.4018/jdwm.2007070101
3	ZHANG Minling , ZHOU Zhihua . A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26 (8): 1819- 1837. doi: 10.1109/TKDE.2013.39
4	KASHEF S , NEZAMABADI-POUR H . A label-specific multi-label feature selection algorithm based on the Pareto dominance concept[J]. Pattern Recognition, 2019, 88, 654- 667. doi: 10.1016/j.patcog.2018.12.020
5	LEE J , SEO W , PARK J H , et al. Compact feature subset-based multi-label music categorization for mobile devices[J]. Multimedia Tools and Applications, 2019, 78 (4): 4869- 4883. doi: 10.1007/s11042-018-6100-8
6	WANG R , RIDLEY R , SU X A , et al. A novel reasoning mechanism for multi-label text classification[J]. Information Processing and Management, 2021, 58 (2): 102441. doi: 10.1016/j.ipm.2020.102441
7	FABRIS F, FREITAS A A. Dependency network methods for hierarchical multi-label classification of gene functions[C]//2014 IEEE Symposium on Computational Intelligence and Data Mining. Piscataway: IEEE, 2014: 241-248.
8	AKHAND B, DEVI V S. Multi-label classification of discrete data[C]//IEEE International Conference on Fuzzy Systems. Piscataway: IEEE, 2013: 1-5.
9	BOUTELL M R , LUO J B , SHEN X P , et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37 (9): 1757- 1771. doi: 10.1016/j.patcog.2004.03.009
10	TSOUMAKAS G , KATAKIS I , VLAHAVAS I P . Random k-labelsets for multilabel classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23 (7): 1079- 1089. doi: 10.1109/TKDE.2010.164
11	READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[C]//Machine Learning and Knowledge Discovery in Databases. European Conference, Berlin: Springer, 2009, 5782: 254-269.
12	ZHANG Minling , ZHOU Zhihua . ML-kNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40 (7): 2038- 2048. doi: 10.1016/j.patcog.2006.12.019
13	PAKRASHI A, NAMEE B M. Stacked-MLkNN: a stacking based improvement to multi-label k-nearest neighbours[C]//First International Workshop on Learning with Imbalanced Domains: Theory and Applications. New York: PMLR, 2017, 74: 51-63.
14	WANG Dengbao, WANG Jingyuan, HU Fei, et al. A locally adaptive multi-label k-nearest neighbor algorithm[C]//Advances in Knowledge Discovery and Data Mining-22nd Pacific-Asia Conference. Berlin: Springer, 2018, 10937: 81-93.
15	SADHUKHAN P, PALIT S. Multi-label learning on principles of reverse k-nearest neighbourhood[J/OL]. Expert Systems, 2020. DOI: 10.1111/exsy.12615.
16	段洁, 胡清华, 张灵均, 等. 基于邻域粗糙集的多标记分类特征选择算法[J]. 计算机研究与发展, 2015, 52 (1): 56- 65.
	DUAN Jie , HU Qinghua , ZHANG Lingjun , et al. Feature selection for multi-label classification based on neighborhood rough sets[J]. Journal of Computer Research and Development, 2015, 52 (1): 56- 65.
17	张文修, 吴伟志, 梁吉业, 等. 粗糙集理论与方法[M]. 北京: 科学出版社, 2001: 232.
	ZHANG Wenxiu , WU Weizhi , LIANG Jiye , et al. Rough sets theory and methods[M]. Beijing: Science Press, 2001: 232.
18	HU Qinghua , YU Daren , LIU Jinfu , et al. Neighborhood rough set based heterogeneous feature subset selection[J]. Information Sciences, 2008, 178 (18): 3577- 3594. doi: 10.1016/j.ins.2008.05.024
19	张晶, 李德玉, 王素格, 等. 基于稳健模糊粗糙集模型的多标记文本分类[J]. 计算机科学, 2015, 42 (7): 270- 275.
	ZHANG Jing , LI Deyu , WANG Suge , et al. Multi-label text classification based on robust fuzzy rough set model[J]. Journal of Computer Science, 2015, 42 (7): 270- 275.
20	DAI Jianhua , HU Hu , WU Weizhi , et al. Maximal-discernibility-pair-based approach to attribute reduction in fuzzy rough sets[J]. IEEE Transactions on Fuzzy Systems, 2018, 26 (4): 2174- 2187. doi: 10.1109/TFUZZ.2017.2768044
21	QIAN Wenbin , HUANG Jintao , WANG Yinglong , et al. Label distribution feature selection for multi-label classification with rough set[J]. International Journal of Approximate Reasoning, 2021, 128, 32- 55. doi: 10.1016/j.ijar.2020.10.002
22	温欣, 李德玉, 王素格. 一种基于邻域关系和模糊决策的特征选择方法[J]. 南京大学学报(自然科学版), 2018, 54 (4): 733- 741.
	WEN Xin , LI Deyu , WANG Suge . A method for feature selection based on neighborhood relation and fuzzy decision[J]. Journal of Nanjing University (Natural Sciences), 2018, 54 (4): 733- 741.
23	HUANG Jun , LI Guorong , WANG Shuhui , et al. Multi-label classification by exploiting local positive and negative pairwise label correlation[J]. Neurocomputing, 2017, 257, 164- 174. doi: 10.1016/j.neucom.2016.12.073

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 10

[1]	WANG Gang, XU Xin-shun*. A new Multi-instance learning method for scene classification[J]. J4, 2010, 45(7): 108 -113 .
[2]	LU Wei-jie,ZHU Chen-fu,SONG Cui and YANG Yan-li . Determination of inorganic cations in the Chinese traditional drug Yujin by capillary electrophoresis[J]. J4, 2007, 42(7): 13 -18 .
[3]	ZHAO Jun1, ZHAO Jing2, FAN Ting-jun1*, YUAN Wen-peng1,3, ZHANG Zheng1, CONG Ri-shan1. Purification and anti-tumor activity examination of water-soluble asterosaponin from Asterias rollestoni Bell[J]. J4, 2013, 48(1): 30 -35 .
[4]	YANG Yong-wei1, 2, HE Peng-fei2, LI Yi-jun2,3. On strict filters of BL-algebras#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 63 -67 .
[5]	HAN Ya-fei, YI Wen-hui, WANG Wen-bo, WANG Yan-ping, WANG Hua-tian*. Soil bacteria diversity in continuous cropping poplar plantation#br# by high throughput sequencing[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 1 -6 .
[6]	XIE Shu-tao,SONG Xiao-yanAntimicrobial activities of Trichokonins: Peptaibollike antimicrobial peptides produced by Trichoderma koningii[J]. J4, 2006, 41(6): 140 -144 .
[7]	LIU Bao-cang,SHI Kai-quan . Reliablity characteristics of Srough sets[J]. J4, 2006, 41(5): 26 -29 .
[8]	TANG Feng-qin1, BAI Jian-ming2. The precise large deviations for a risk model with extended negatively upper orthant dependent claim sizes[J]. J4, 2013, 48(1): 100 -106 .
[9]	QANG Yao,LIU Jian and WANG Ren-qing,* . Allee effect and its significance to small population management in nature reservation and biological invasions[J]. J4, 2007, 42(1): 76 -82 .
[10]	XIE Juan-ying1, 2, ZHANG Yan1, XIE Wei-xin2, 3, GAO Xin-bo2. A novel rough K-means clustering algorithm based on the weight of density[J]. J4, 2010, 45(7): 1 -6 .

Number	Data set	Sample	Attribute	Label	Domain
1	GpositivePseAAC	519	44 0	4	biology
2	Emotions	593	72	6	music
3	Medical	978	1 449	45	text
4	Water-quality	1 060	16	14	chemistry
5	Image	2 000	294	5	image
6	Scene	2 407	294	6	image
7	Yeast	2 417	103	14	biology
8	Business	5 000	438	30	text
9	Yelp	10 810	671	5	text
10	Mediamill	43 907	120	101	video

Data set	δ
GpositivePseAAC	4.00~4.35
Emotions	1.30~1.55
Medical	2.80~3.05
Water-quality	1.50~1.75
Image	4.30~4.60
Scene	2.75~3.00
Yeast	1.25~1.50
Business	1.70~1.95
Yelp	6.00~6.25
Mediamill	2.00~2.35

Method	HL↓	RL↓	OE↓	CV↓	AP↑
MLRS	0.163 8±0.030 3	0.160 6±0.024 1	0.323 7±0.046 7	0.487 5±0.063 1	0.813 7±0.026 1
LPLC	0.164 6±0.024 2	0.162 0±0.036 7	0.285 2±0.059 4	0.464 4±0.107 1	0.824 8±0.036 5
ML-KNN	0.155 1±0.026 7	0.157 2±0.029 6	0.310 2±0.059 6	0.479 9±0.084 4	0.819 5±0.033 6
Stacked_KNN	0.148 3±0.035 2	0.159 1±0.037 9	0.314 0±0.061 3	0.487 5±0.113 6	0.817 5±0.037 8
LAMLKNN	0.154 1±0.029 0	0.149 3±0.025 7	0.292 9±0.055 7	0.454 7±0.073 7	0.829 5±0.028 9
ML_RKNN	0.248 1±0.028 5	0.583 3±0.077 2	0.233 3±0.044 1	0.977 2±0.147 6	0.675 7±0.045 4
NRS_MLKNN	0.147 4±0.029 4	0.146 9±0.030 2	0.289 0±0.064 4	0.449 0±0.087 4	0.831 6±0.035 3

Method	HL↓	RL↓	OE↓	CV↓	AP↑
MLRS	0.193 0±0.016 0	0.169 6±0.023 9	0.263 1±0.055 7	1.804 2±0.146 7	0.801 4±0.027 4
LPLC	0.202 5±0.022 6	0.159 5±0.026 1	0.273 1±0.040 8	1.768 4±0.160 7	0.802 3±0.023 5
ML-KNN	0.192 5±0.017 2	0.162 1±0.017 3	0.266 6±0.030 5	1.797 5±0.088 5	0.799 6±0.015 5
Stacked_KNN	0.198 6±0.024 1	0.172 7±0.026 8	0.268 2±0.054 5	1.848 2±0.155 4	0.793 5±0.031 9
LAMLKNN	0.195 0±0.014 8	0.159 5±0.023 3	0.283 2±0.057 4	1.762 0±0.134 2	0.800 3±0.026 5
ML_RKNN	0.323 2±0.032 4	0.339 9±0.059 4	0.379 4±0.052 6	2.664 3±0.334 6	0.686 5±0.040 4
NRS_MLKNN	0.195 0±0.012 1	0.159 2±0.012 3	0.263 2±0.036 3	1.777 3±0.081 5	0.803 5±0.014 7

Method	HL↓	RL↓	OE↓	CV↓	AP↑
MLRS	0.018 7±0.002 3	0.104 3±0.024 7	0.338 5±0.044 1	3.615 7±1.078 1	0.745 6±0.033 9
LPLC	0.018 8±0.00 2	0.079 5±0.015 9	0.283 3±0.036 7	4.401 5±1.092 2	0.757 8±0.037 8
ML-KNN	0.015 6±0.002 1	0.042 0±0.011 4	0.249 6±0.041 7	2.745 1±0.818 7	0.808 3±0.030 5
Stacked_KNN	0.01 5±0.002 1	0.057 6±0.015 5	0.248 5±0.037 3	3.551 4±1.052 9	0.791 0±0.027 2
LAMLKNN	0.015 9±0.002 1	0.037 4±0.010 5	0.244 5±0.042 2	2.225 2±0.697 5	0.816 5±0.029 4
ML_RKNN	0.052 2±0.006 7	0.431 0±0.048 3	0.273 0±0.033 3	13.564 3±2.00 4	0.522 4±0.033 9
NRS_MLKNN	0.014 0±0.002 2	0.042 4±0.011 2	0.220 9±0.032 0	2.795 5±0.787 2	0.820 4±0.024 0

The ML-KNN method based on attribute weighting

RichHTML

PDF (PC)

Abstract

Cite this article

share this article

Figures/Tables 14

References 23

Related Articles 5

Metrics

Comments

Recommended 10

[1]	Chengxiang HU,Li ZHANG,Xiaoling HUANG,Huibin WANG. Dynamic neighborhood rough sets approaches for updating knowledge while attributes generalization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(7): 37-51.
[2]	SHI Junpeng, ZHANG Yanlan. Dynamic updating algorithm of local neighborhood rough sets with the deletion of objects [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(5): 17-25.
[3]	LIU Changshun, LIU Yan, SONG Jingjing, XU Taihua. Attribute reduction algorithm based on discreteness of the universe [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(5): 26-35.
[4]	ZHENG Cheng-yu, WANG Xin, WANG Ting, DENG Ya-ping, YIN Tian-tian. Multi-label classification for medical text based on ALBERT-TextCNN model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(4): 21-29.
[5]	SUN Lin, LIANG Na, XU Jiu-cheng. Feature selection using adaptive neighborhood mutual information and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 13-24.

Method	HL↓	RL↓	OE↓	CV↓	AP↑
MLRS	0.340 8±0.009 8	0.297 8±0.016 1	0.336 9±0.050 4	9.174 5±0.206 0	0.645 7±0.022 6
LPLC	0.316 3±0.009 1	0.263 4±0.015 8	0.284 6±0.042 4	8.889 6±0.220 1	0.684 5±0.019 2
ML-KNN	0.292 0±0.011 2	0.259 4±0.013 5	0.293 2±0.052 4	8.776 4±0.241 2	0.689 8±0.020 2
Stacked_KNN	0.297 1±0.009 3	0.266 7±0.016 7	0.319 7±0.047 6	8.837 7±0.193 7	0.677 5±0.020 1
LAMLKNN	0.294 7±0.008 9	0.261 8±0.014 0	0.279 0±0.033 7	8.853 8±0.278 7	0.688 3±0.018 9
ML_RKNN	0.404 4±0.020 5	0.385 3±0.017 2	0.423 2±0.041 4	10.317 9±0.241 2	0.590 0±0.018 2
NRS_MLKNN	0.290 4±0.009 7	0.259 7±0.016 4	0.279 9±0.046 2	8.776 4±0.256 9	0.691 5±0.022 5

Method	HL↓	RL↓	OE↓	CV↓	AP↑
MLRS	0.175 4±0.014 7	0.186 8±0.019 6	0.333 5±0.034 1	0.978 0±0.103 8	0.786 2±0.020 4
LPLC	0.178 4±0.014 2	0.196 8±0.020 7	0.330 0±0.028 7	0.999 5±0.099 3	0.780 8±0.017 1
ML-KNN	0.170 1±0.014 1	0.176 5±0.020 2	0.319 5±0.033 2	0.978 0±0.103 4	0.790 0±0.020 3
Stacked_KNN	0.176 5±0.016 2	0.188 0±0.023 2	0.333 0±0.030 0	1.018 0±0.115 7	0.780 6±0.022 1
LAMLKNN	0.170 8±0.015 3	0.177 2±0.020 4	0.321 0±0.032 3	0.983 0±0.112 8	0.788 5±0.020 8
ML_RKNN	0.287 1±0.013 9	0.317 4±0.025 9	0.378 0±0.030 3	1.346 5±0.096 4	0.716 7±0.020 3
NRS_MLKNN	0.171 7±0.015 7	0.174 7±0.021 6	0.320 0±0.036 1	0.968 5±0.112 1	0.791 5±0.021 9

Method	HL↓	RL↓	OE↓	CV↓	AP↑
MLRS	0.092 3±0.006 1	0.099 2±0.011 5	0.253 0±0.017 1	0.539 3±0.064 4	0.848 6±0.011 7
LPLC	0.096 5±0.006 5	0.090 8±0.010 6	0.250 5±0.021 8	0.519 8±0.062 6	0.847 0±0.013 1
ML-KNN	0.085 2±0.008 2	0.076 8±0.009 1	0.226 0±0.015 9	0.470 7±0.059 3	0.866 5±0.009 9
Stacked_KNN	0.087 9±0.005 5	0.085 3±0.008 6	0.232 2±0.013 8	0.515 6±0.056 8	0.859 1±0.008 6
LAMLKNN	0.085 5±0.006 7	0.074 0±0.008 7	0.225 2±0.011 8	0.455 8±0.052 6	0.867 8±0.008 4
ML_RKNN	0.164 9±0.008 9	0.254 7±0.031 3	0.286 2±0.030 4	1.108 8±0.108 2	0.761 1±0.020 9
NRS_MLKNN	0.084 7±0.006 4	0.075 4±0.009 4	0.220 2±0.015 4	0.463 7±0.061 8	0.869 5±0.010 4

Method	HL↓	RL↓	OE↓	CV↓	AP↑
MLRS	0.204 4±0.009 0	0.181 5±0.009 0	0.240 0±0.020 6	6.391 6±0.213 2	0.748 2±0.013 7
LPLC	0.204 0±0.012 5	0.168 9±0.009 7	0.229 2±0.029 5	6.311 6±0.187 1	0.762 4±0.018 4
ML-KNN	0.192 7±0.006 6	0.164 3±0.008 7	0.230 5±0.026 0	6.202 4±0.168 9	0.765 8±0.013 7
Stacked_KNN	0.198 5±0.009 9	0.179 3±0.009 2	0.254 9±0.030 3	6.509 0±0.125 7	0.749 1±0.017 1
LAMLKNN	0.193 8±0.007 0	0.165 1±0.008 4	0.225 9±0.022 1	6.222 7±0.153 2	0.765 1±0.013 1
ML_RKNN	0.375 9±0.018 2	0.381 3±0.021 1	0.467 4±0.034 7	9.080 2±0.218 8	0.575 1±0.021 1
NRS_MLKNN	0.192 8±0.006 6	0.163 4±0.008 3	0.227 6±0.021 8	6.196 2±0.164 5	0.767 7±0.013 3