《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (3): 102-109.doi: 10.6040/j.issn.1671-9352.1.2018.107
• • 上一篇
周鹏1,2,伊静1,3,朱振方4,刘培玉1,2*
ZHOU Peng1,2, YI Jing1,3, ZHU Zhen-fang4, LIU Pei-yu1,2*
摘要: 许多真实世界的数据集都存在一个称为类不平衡问题的问题。传统的分类算法在对不平衡数据进行分类时,容易导致少数类被错分。为了提高少数类样本的分类准确度,提出了一种基于固定半径最近邻的逐步竞争算法(FRNNPC),通过固定半径邻(FRNN)对数据集进行预处理,在全局范围内消除不必要的数据,在得到的候选数据中使用逐步竞争算法(NPC),即逐渐计算查询样本邻近样本的分值,直到一个类的分值总和高于另一个类。简而言之,该方法能够有效地处理不平衡问题,而且不需要任何手动设置的参数。实验结果将所提出的方法与4种代表性算法在10个不平衡数据集上进行了比较,并验证了该算法的有效性。
中图分类号:
[1] 李勇, 刘战东, 张海军. 不平衡数据的集成分类算法综述[J]. 计算机应用研究, 2014, 31(5):1287-1291. LI Yong, LIU Zhandong, ZHANG Haijun. Overview of integrated classification algorithms for unbalanced data[J]. Journal of Computer Applications, 2014, 31(5):1287-1291. [2] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1):321-357. [3] ABEDALLAH L, SHIMSHONI I. k nearest neighbor using ensemble clustering[C] // International Conference on Data Warehousing and Knowledge Discovery. Berlin: Springer-Verlag, 2012:265-278. [4] LIU Wei, CHAWLA S. Class confidence weighted kNN algorithms for imbalanced data sets[C] // Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 345-356. [5] DUBEY H, PUDI V. Class based weighted k-nearest neighbor over imbalance dataset[M] // DUBEY H, PUDI V. eds. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2013: 305-316. [6] ZHU Y J, WANG Z, GAO D Q. Gravitational fixed radius nearest neighbor for imbalanced problem[J]. Knowledge-Based Systems, 2015, 90:224-238. [7] NIKPOUR B, SHABANI M, NEZAMABADI-POUR H. Proposing new method to improve gravitational fixed nearest neighbor algorithm for imbalanced data classification[C] // 2017 2nd Conference on Swarm Intelligence and Evolutionary Computation(CSIEC).[S.l.] : IEEE, 2017. [8] CHAWLA V N, LAZAREVIC A, HALL O L, et al. SMOTEBoost: improving prediction of the minority class in boosting.[J]. Lecture Notes in Computer Science, 2003, 2838:107-119. [9] MUJA M, LOWE D G. Scalable nearest neighbor algorithms for high dimensional data[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11):2227-2240. [10] SARYAZDI S, NIKPOUR B, NEZAMABADIPOUR H. NPC: neighbors progressive competition algorithm for classification of imbalanced data sets[J]. arXiv:1711.10934(2017). [11] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496. [12] JESUS M J D, VENTURA S, GARRELL J M, et al. KEEL: a software tool to assess evolutionary algorithms for data mining problems[J]. Soft Computing-A Fusion of Foundations, Methodologies and Applications, 2008, 13(3):307-318. [13] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C] // Proceedings of the 14th International Joint Conference on Artificial Intelligence-Volume 2.[S.l.] : Morgan Kaufmann Publishers Inc. 1995. [14] LI Y X, ZHANG X Z. Improving k nearest neighbor with exemplar generalization for imbalanced classification[M] // LI Y X, ZHANG X Z. eds. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 321-332. |
[1] | 宋玉丹,王士同*. 基于特征缺省的最小类内方差支持向量机[J]. J4, 2010, 45(7): 102-107. |
|