JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2019, Vol. 54 ›› Issue (3): 102-109.doi: 10.6040/j.issn.1671-9352.1.2018.107

Previous Articles    

Fixed-radius nearest neighbor progressive competition algorithm for imbalanced classification

ZHOU Peng1,2, YI Jing1,3, ZHU Zhen-fang4, LIU Pei-yu1,2*   

  1. 1. School of Information Science &
    Engineering, Shandong Normal University, Jinan 250358, Shandong, China;
    2. Shandong ProvincialKey Laboratory for Distributed Computer Software Novel Technology, Jinan 250358, Shandong, China;
    3. School of Computer Science &
    Technology, Shandong Jianzhu University, Jinan 250014, Shandong, China;
    4. School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan 250357, Shandong, China
  • Published:2019-03-19

Abstract: There is a problem called class imbalance in many real-world datasets. When traditional classification algorithms classifying imbalanced data, it is easy to misclassify the minority class. In order to improve the classification accuracy of the samples from the minority class, this paper proposes a fixed-radius nearest neighbor progressive competition algorithm(FRNNPC). As a preconditioning, FRNNPC eliminates ineligible samples globally through the fixed-radius nearest neighbor rule, and use the NPC in the obtained candidate data to gradually calculate the score of the nearest neighbor sample of the query sample until the sum of the scores of the one class is higher than another class. In short, this method can effectively deal with the imbalance problem, and does not require any manually set parameters. The experimental results compare the proposed method with four representative algorithms applied to 10 imbalanced data sets, and illustrate the effectiveness of the algorithm.

Key words: imbalanced data, nearest neighbors rule, pattern classification

CLC Number: 

  • TP301.6
[1] 李勇, 刘战东, 张海军. 不平衡数据的集成分类算法综述[J]. 计算机应用研究, 2014, 31(5):1287-1291. LI Yong, LIU Zhandong, ZHANG Haijun. Overview of integrated classification algorithms for unbalanced data[J]. Journal of Computer Applications, 2014, 31(5):1287-1291.
[2] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1):321-357.
[3] ABEDALLAH L, SHIMSHONI I. k nearest neighbor using ensemble clustering[C] // International Conference on Data Warehousing and Knowledge Discovery. Berlin: Springer-Verlag, 2012:265-278.
[4] LIU Wei, CHAWLA S. Class confidence weighted kNN algorithms for imbalanced data sets[C] // Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 345-356.
[5] DUBEY H, PUDI V. Class based weighted k-nearest neighbor over imbalance dataset[M] // DUBEY H, PUDI V. eds. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2013: 305-316.
[6] ZHU Y J, WANG Z, GAO D Q. Gravitational fixed radius nearest neighbor for imbalanced problem[J]. Knowledge-Based Systems, 2015, 90:224-238.
[7] NIKPOUR B, SHABANI M, NEZAMABADI-POUR H. Proposing new method to improve gravitational fixed nearest neighbor algorithm for imbalanced data classification[C] // 2017 2nd Conference on Swarm Intelligence and Evolutionary Computation(CSIEC).[S.l.] : IEEE, 2017.
[8] CHAWLA V N, LAZAREVIC A, HALL O L, et al. SMOTEBoost: improving prediction of the minority class in boosting.[J]. Lecture Notes in Computer Science, 2003, 2838:107-119.
[9] MUJA M, LOWE D G. Scalable nearest neighbor algorithms for high dimensional data[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11):2227-2240.
[10] SARYAZDI S, NIKPOUR B, NEZAMABADIPOUR H. NPC: neighbors progressive competition algorithm for classification of imbalanced data sets[J]. arXiv:1711.10934(2017).
[11] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[12] JESUS M J D, VENTURA S, GARRELL J M, et al. KEEL: a software tool to assess evolutionary algorithms for data mining problems[J]. Soft Computing-A Fusion of Foundations, Methodologies and Applications, 2008, 13(3):307-318.
[13] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C] // Proceedings of the 14th International Joint Conference on Artificial Intelligence-Volume 2.[S.l.] : Morgan Kaufmann Publishers Inc. 1995.
[14] LI Y X, ZHANG X Z. Improving k nearest neighbor with exemplar generalization for imbalanced classification[M] // LI Y X, ZHANG X Z. eds. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 321-332.
[1] DU Hong-le, ZHANG Yan, ZHANG Lin. Intrusion detection on imbalanced dataset [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 50-57.
[2] SONG Yu-dan, WANG Shi-tong*. Minimum within-class variance SVM with absent features [J]. J4, 2010, 45(7): 102-107.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!