《山东大学学报(理学版)》 ›› 2023, Vol. 58 ›› Issue (12): 41-51.doi: 10.6040/j.issn.1671-9352.4.2022.3492
Yu FANG*(
),Huyu ZHENG,Xuemei CAO
摘要:
结合三支决策和合成少数过采样技术(synthetic minority over-sampling technique, SMOTE), 提出了一种新的采样方法—三支过采样(three-way over-sampling, 3WOS)。3WOS通过对所有样本构建三支决策模型, 选取该模型边界域中的样本作为关键样本进行SMOTE过采样, 从而有效缓解样本聚集和分离问题, 在一定程度上提高了分类器性能。该方法首先在少数类样本上应用三支决策和支持向量数据描述, 将所有样本数据进行三分; 其次, 找出所有关键样本的k个最近邻少数类样本, 并使用线性插值方式对每个关键样本合成新样本, 然后形成新的少数类样本; 最后, 将更新后的样本集用于训练分类器。实验结果表明, 3WOS方法比其他方法在基分类器上有较好的分类准确度、F-measure、G-mean和较少的代价值。
中图分类号:
| 1 | ZHENG Z H , WU X Y , SRIHARI R . Feature selection for text categorization on imbalanced data[J]. ACM Sigkdd Explorations Newsletter, 2004, 6 (1): 80- 89. |
| 2 | HE H B , GARCIA E A . Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21 (9): 1263- 1284. |
| 3 | ANAND A , PUGALENTHI G , FOGEL G B , et al. An approach for classification of highly imbalanced data using weighting and undersampling[J]. Amino Acids, 2010, 39 (5): 1385- 1391. |
| 4 | LIU L , CAI Y D , LU W C , et al. Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection[J]. Biochemical and Biophysical Research Communications, 2009, 380 (2): 318- 322. |
| 5 | CHAWLA N V , BOWYER K W , HALL L O , et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16 (1): 321- 357. |
| 6 | HAN H, WANG W Y, MAO B H. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]//Proceedings of the International Conference on Intelligent Computing. Berlin: Springer, 2005: 878-887. |
| 7 | HE H B, YANG B, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]//Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Hong Kong: IEEE, 2008: 1322-1328. |
| 8 | BARUA S , ISLAM M M , YAO X , et al. MWMOTE: majority weighted minority oversampling technique for imbalanced data set learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 26 (2): 405- 425. |
| 9 | 祝团飞, 孙婧, 李宜洲, 等. BOS: 一种用于不平衡数据学习的边界过采样方法[J]. 四川大学学报(自然科学版), 2012, 49 (3): 553- 559. |
| ZHU Tuanfei , SUN Jing , LI Yizhou , et al. BOS: a borderline over-sampling method for imbalanced data learning[J]. Journal of Sichuan University (Natural Science Edition), 2012, 49 (3): 553- 559. | |
| 10 | FANG Y , CAO X M , WANG X , et al. Three-way sampling for rapid attribute reduction[J]. Information Sciences, 2022, 609, 26- 45. |
| 11 | LIU D , LIANG D C , WANG C C . A novel three-way decision model based on incomplete information system[J]. Knowledge-Based Systems, 2016, 91, 32- 45. |
| 12 | YAO Y Y. Three-way decision: an interpretation of rules in rough set theory[C]//Proceedings of the International Conference on Rough Sets and Knowledge Technology. Berlin: Springer, 2009: 642-649. |
| 13 | PAWLAK Z . Rough sets[J]. International Journal of Computer and Information Sciences, 1982, 11 (5): 341- 356. |
| 14 | PAWLAK Z . Rough sets: theoretical aspects of reasoning about data[M]. Dordrecht: Kluwer Academic Publishers, 1992. |
| 15 | YAO Y Y. Decision-theoretic rough set models[C]//Proceedings of the International Conference on Rough Sets and Knowledge Technology. Berlin: Springer, 2007: 1-12. |
| 16 | YAO Y Y. An outline of a theory of three-way decisions[C]//Proceedings of the International Conference on Rough Sets and Current Trends in Computing. Berlin: Springer, 2012: 1-17. |
| 17 | YAN Y T , WU Z B , DU X Q , et al. A three-way decision ensemble method for imbalanced data oversampling[J]. International Journal of Approximate Reasoning, 2019, 107, 1- 16. |
| 18 | TAO X M , ZHENG Y J , CHEN W , et al. SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning[J]. Information Sciences, 2022, 588, 13- 51. |
| 19 | TAX D M , DUIN R P . Support vector data description[J]. Machine Learning, 2004, 54 (1): 45- 66. |
| 20 | FANG Y, CAO X M, WANG X, et al. Hypersphere neighborhood rough set for rapid attribute reduction[C]//Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham: Springer, 2022: 161-173 |
| 21 | FANG Y , CAO X M , WANG X , et al. Three-way sampling for rapid attribute reduction[J]. Information Sciences, 2022, 609, 26- 45. |
| 22 | JIANG H S, WANG H Y, HU W H, et al. Fast incremental SVDD learning algorithm with the Gaussian kernel[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2019. |
| 23 | JIANG K , LU J , XIA K L . A novel algorithm for imbalance data classification based on genetic algorithm improved SMOTE[J]. Arabian Journal for Science and Engineering, 2016, 41 (8): 3255- 3266. |
| 24 | FANG Y , GAO C , YAO Y Y . Granularity-driven sequential three-way decisions: a cost-sensitive approach to classification[J]. Information Sciences, 2020, 507, 644- 664. |
| 25 | THAI-NGHE N, GANTNER Z, SCHMIDT-THIEME L. Cost-sensitive learning methods for imbalanced data[C]//Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN). Barcelona: IEEE, 2010: 1-8. |
| [1] | 何怡,邵亚斌,冯慧,郭瑞莲. 基于快速超粒方生成算法的分类器模型[J]. 《山东大学学报(理学版)》, 2026, 61(5): 65-78. |
| [2] | 孙清,叶军,曾广财,宋苏洋,汪一心. 结合蝙蝠算法和紧密度改进的三支K-means算法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 65-75. |
| [3] | 钱文彬,彭嘉豪,蔡星星. 基于邻域粒度与三支决策的知识表示学习方法[J]. 《山东大学学报(理学版)》, 2025, 60(7): 94-103. |
| [4] | 国栋凯,张钦然,李小南,易黄建. 基于新型阴影集的模糊C均值聚类算法[J]. 《山东大学学报(理学版)》, 2025, 60(1): 74-82. |
| [5] | 纪杰,孙承杰,单丽莉,尚伯乐,林磊. 基于提示学习的电信网络诈骗案件分类方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 113-121. |
| [6] | 黎超,廖薇. 基于医疗知识驱动的中文疾病文本分类模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 122-130. |
| [7] | 范敏,秦琴,李金海. 基于三支因果力的邻域推荐算法[J]. 《山东大学学报(理学版)》, 2024, 59(5): 12-22. |
| [8] | 朱金,付玉,管文瑞,王平心. 基于自然最近邻的样本扰动三支聚类[J]. 《山东大学学报(理学版)》, 2024, 59(5): 45-51. |
| [9] | 方逢祺,吴伟志. 决策集值系统中的知识约简[J]. 《山东大学学报(理学版)》, 2024, 59(5): 82-89. |
| [10] | 温欣,李德玉. 基于属性加权的ML-KNN方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 107-117. |
| [11] | 王茜,张贤勇. 不完备邻域加权多粒度决策理论粗糙集及三支决策[J]. 《山东大学学报(理学版)》, 2023, 58(9): 94-104. |
| [12] | 胡成祥,张莉,黄晓玲,王汇彬. 面向属性变化的动态邻域粗糙集知识更新方法[J]. 《山东大学学报(理学版)》, 2023, 58(7): 37-51. |
| [13] | 王君宇,杨亚锋,薛静轩,李丽红. 可拓序贯三支决策模型及应用[J]. 《山东大学学报(理学版)》, 2023, 58(7): 67-79. |
| [14] | 孟金旭,单鸿涛,黄润才,闫丰亭,李志伟,郑光远,刘一鸣,石昌通. 基于XLNet的双通道特征融合文本分类模型[J]. 《山东大学学报(理学版)》, 2023, 58(5): 36-45. |
| [15] | 凡嘉琛,王平心,杨习贝. 基于三支决策的密度敏感谱聚类[J]. 《山东大学学报(理学版)》, 2023, 58(1): 59-66. |
|
||