您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (3): 102-109.doi: 10.6040/j.issn.1671-9352.1.2018.107

• • 上一篇    

面向不平衡分类的固定半径最近邻逐步竞争算法(FRNNPC)

周鹏1,2,伊静1,3,朱振方4,刘培玉1,2*   

  1. 1.山东师范大学信息科学与工程学院, 山东 济南 250358;2.山东省分布式计算机软件新技术重点实验室, 山东 济南 250358;3.山东建筑大学计算机科学与技术学院, 山东 济南 250014;4.山东交通学院信息科学与电气工程学院, 山东 济南 250357
  • 发布日期:2019-03-19
  • 作者简介:周鹏(1990— )男,硕士研究生,研究方向为网络信息全. E-mail:zhoucepan1990@163.com*通信作者简介:刘培玉(1960— )男,博士,教授,研究方向为网络信息安全. E-mail:1257639125@qq.com
  • 基金资助:
    国家自然科学基金资助项目(61373148,61502151);教育部人文社科基金资助项目(14YJC860042);山东省自然科学基金资助项目(ZR2014FL010)

Fixed-radius nearest neighbor progressive competition algorithm for imbalanced classification

ZHOU Peng1,2, YI Jing1,3, ZHU Zhen-fang4, LIU Pei-yu1,2*   

  1. 1. School of Information Science &
    Engineering, Shandong Normal University, Jinan 250358, Shandong, China;
    2. Shandong ProvincialKey Laboratory for Distributed Computer Software Novel Technology, Jinan 250358, Shandong, China;
    3. School of Computer Science &
    Technology, Shandong Jianzhu University, Jinan 250014, Shandong, China;
    4. School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan 250357, Shandong, China
  • Published:2019-03-19

摘要: 许多真实世界的数据集都存在一个称为类不平衡问题的问题。传统的分类算法在对不平衡数据进行分类时,容易导致少数类被错分。为了提高少数类样本的分类准确度,提出了一种基于固定半径最近邻的逐步竞争算法(FRNNPC),通过固定半径邻(FRNN)对数据集进行预处理,在全局范围内消除不必要的数据,在得到的候选数据中使用逐步竞争算法(NPC),即逐渐计算查询样本邻近样本的分值,直到一个类的分值总和高于另一个类。简而言之,该方法能够有效地处理不平衡问题,而且不需要任何手动设置的参数。实验结果将所提出的方法与4种代表性算法在10个不平衡数据集上进行了比较,并验证了该算法的有效性。

关键词: 不平衡数据, 最近邻规则, 模式分类

Abstract: There is a problem called class imbalance in many real-world datasets. When traditional classification algorithms classifying imbalanced data, it is easy to misclassify the minority class. In order to improve the classification accuracy of the samples from the minority class, this paper proposes a fixed-radius nearest neighbor progressive competition algorithm(FRNNPC). As a preconditioning, FRNNPC eliminates ineligible samples globally through the fixed-radius nearest neighbor rule, and use the NPC in the obtained candidate data to gradually calculate the score of the nearest neighbor sample of the query sample until the sum of the scores of the one class is higher than another class. In short, this method can effectively deal with the imbalance problem, and does not require any manually set parameters. The experimental results compare the proposed method with four representative algorithms applied to 10 imbalanced data sets, and illustrate the effectiveness of the algorithm.

Key words: imbalanced data, nearest neighbors rule, pattern classification

中图分类号: 

  • TP301.6
[1] 李勇, 刘战东, 张海军. 不平衡数据的集成分类算法综述[J]. 计算机应用研究, 2014, 31(5):1287-1291. LI Yong, LIU Zhandong, ZHANG Haijun. Overview of integrated classification algorithms for unbalanced data[J]. Journal of Computer Applications, 2014, 31(5):1287-1291.
[2] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1):321-357.
[3] ABEDALLAH L, SHIMSHONI I. k nearest neighbor using ensemble clustering[C] // International Conference on Data Warehousing and Knowledge Discovery. Berlin: Springer-Verlag, 2012:265-278.
[4] LIU Wei, CHAWLA S. Class confidence weighted kNN algorithms for imbalanced data sets[C] // Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 345-356.
[5] DUBEY H, PUDI V. Class based weighted k-nearest neighbor over imbalance dataset[M] // DUBEY H, PUDI V. eds. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2013: 305-316.
[6] ZHU Y J, WANG Z, GAO D Q. Gravitational fixed radius nearest neighbor for imbalanced problem[J]. Knowledge-Based Systems, 2015, 90:224-238.
[7] NIKPOUR B, SHABANI M, NEZAMABADI-POUR H. Proposing new method to improve gravitational fixed nearest neighbor algorithm for imbalanced data classification[C] // 2017 2nd Conference on Swarm Intelligence and Evolutionary Computation(CSIEC).[S.l.] : IEEE, 2017.
[8] CHAWLA V N, LAZAREVIC A, HALL O L, et al. SMOTEBoost: improving prediction of the minority class in boosting.[J]. Lecture Notes in Computer Science, 2003, 2838:107-119.
[9] MUJA M, LOWE D G. Scalable nearest neighbor algorithms for high dimensional data[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11):2227-2240.
[10] SARYAZDI S, NIKPOUR B, NEZAMABADIPOUR H. NPC: neighbors progressive competition algorithm for classification of imbalanced data sets[J]. arXiv:1711.10934(2017).
[11] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[12] JESUS M J D, VENTURA S, GARRELL J M, et al. KEEL: a software tool to assess evolutionary algorithms for data mining problems[J]. Soft Computing-A Fusion of Foundations, Methodologies and Applications, 2008, 13(3):307-318.
[13] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C] // Proceedings of the 14th International Joint Conference on Artificial Intelligence-Volume 2.[S.l.] : Morgan Kaufmann Publishers Inc. 1995.
[14] LI Y X, ZHANG X Z. Improving k nearest neighbor with exemplar generalization for imbalanced classification[M] // LI Y X, ZHANG X Z. eds. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 321-332.
[1] 宋玉丹,王士同*. 基于特征缺省的最小类内方差支持向量机[J]. J4, 2010, 45(7): 102-107.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!