面向不平衡分类的固定半径最近邻逐步竞争算法(FRNNPC)

doi:10.6040/j.issn.1671-9352.1.2018.107

《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (3): 102-109.doi: 10.6040/j.issn.1671-9352.1.2018.107

• • 上一篇

面向不平衡分类的固定半径最近邻逐步竞争算法(FRNNPC)

周鹏^1,2,伊静^1,3,朱振方⁴,刘培玉^1,2*

1.山东师范大学信息科学与工程学院, 山东济南 250358;2.山东省分布式计算机软件新技术重点实验室, 山东济南 250358;3.山东建筑大学计算机科学与技术学院, 山东济南 250014;4.山东交通学院信息科学与电气工程学院, 山东济南 250357

发布日期:2019-03-19
作者简介:周鹏(1990— )男,硕士研究生,研究方向为网络信息全. E-mail:zhoucepan1990@163.com*通信作者简介:刘培玉(1960— )男,博士,教授,研究方向为网络信息安全. E-mail:1257639125@qq.com
基金资助:
国家自然科学基金资助项目(61373148,61502151);教育部人文社科基金资助项目(14YJC860042);山东省自然科学基金资助项目(ZR2014FL010)

Fixed-radius nearest neighbor progressive competition algorithm for imbalanced classification

ZHOU Peng^1,2, YI Jing^1,3, ZHU Zhen-fang⁴, LIU Pei-yu^1,2*

1. School of Information Science &
Engineering, Shandong Normal University, Jinan 250358, Shandong, China;
2. Shandong ProvincialKey Laboratory for Distributed Computer Software Novel Technology, Jinan 250358, Shandong, China;
3. School of Computer Science &
Technology, Shandong Jianzhu University, Jinan 250014, Shandong, China;
4. School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan 250357, Shandong, China

Published:2019-03-19

摘要/Abstract

摘要： 许多真实世界的数据集都存在一个称为类不平衡问题的问题。传统的分类算法在对不平衡数据进行分类时,容易导致少数类被错分。为了提高少数类样本的分类准确度,提出了一种基于固定半径最近邻的逐步竞争算法(FRNNPC),通过固定半径邻(FRNN)对数据集进行预处理,在全局范围内消除不必要的数据,在得到的候选数据中使用逐步竞争算法(NPC),即逐渐计算查询样本邻近样本的分值,直到一个类的分值总和高于另一个类。简而言之,该方法能够有效地处理不平衡问题,而且不需要任何手动设置的参数。实验结果将所提出的方法与4种代表性算法在10个不平衡数据集上进行了比较,并验证了该算法的有效性。

关键词: 不平衡数据, 最近邻规则, 模式分类

Abstract: There is a problem called class imbalance in many real-world datasets. When traditional classification algorithms classifying imbalanced data, it is easy to misclassify the minority class. In order to improve the classification accuracy of the samples from the minority class, this paper proposes a fixed-radius nearest neighbor progressive competition algorithm(FRNNPC). As a preconditioning, FRNNPC eliminates ineligible samples globally through the fixed-radius nearest neighbor rule, and use the NPC in the obtained candidate data to gradually calculate the score of the nearest neighbor sample of the query sample until the sum of the scores of the one class is higher than another class. In short, this method can effectively deal with the imbalance problem, and does not require any manually set parameters. The experimental results compare the proposed method with four representative algorithms applied to 10 imbalanced data sets, and illustrate the effectiveness of the algorithm.

Key words: imbalanced data, nearest neighbors rule, pattern classification

中图分类号:

TP301.6

周鹏,伊静,朱振方,刘培玉. 面向不平衡分类的固定半径最近邻逐步竞争算法(FRNNPC)[J]. 《山东大学学报(理学版)》, 2019, 54(3): 102-109.

ZHOU Peng, YI Jing, ZHU Zhen-fang, LIU Pei-yu. Fixed-radius nearest neighbor progressive competition algorithm for imbalanced classification[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 102-109.

参考文献

[1] 李勇, 刘战东, 张海军. 不平衡数据的集成分类算法综述[J]. 计算机应用研究, 2014, 31(5):1287-1291. LI Yong, LIU Zhandong, ZHANG Haijun. Overview of integrated classification algorithms for unbalanced data[J]. Journal of Computer Applications, 2014, 31(5):1287-1291.
[2] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1):321-357.
[3] ABEDALLAH L, SHIMSHONI I. k nearest neighbor using ensemble clustering[C] // International Conference on Data Warehousing and Knowledge Discovery. Berlin: Springer-Verlag, 2012:265-278.
[4] LIU Wei, CHAWLA S. Class confidence weighted kNN algorithms for imbalanced data sets[C] // Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 345-356.
[5] DUBEY H, PUDI V. Class based weighted k-nearest neighbor over imbalance dataset[M] // DUBEY H, PUDI V. eds. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2013: 305-316.
[6] ZHU Y J, WANG Z, GAO D Q. Gravitational fixed radius nearest neighbor for imbalanced problem[J]. Knowledge-Based Systems, 2015, 90:224-238.
[7] NIKPOUR B, SHABANI M, NEZAMABADI-POUR H. Proposing new method to improve gravitational fixed nearest neighbor algorithm for imbalanced data classification[C] // 2017 2nd Conference on Swarm Intelligence and Evolutionary Computation(CSIEC).[S.l.] : IEEE, 2017.
[8] CHAWLA V N, LAZAREVIC A, HALL O L, et al. SMOTEBoost: improving prediction of the minority class in boosting.[J]. Lecture Notes in Computer Science, 2003, 2838:107-119.
[9] MUJA M, LOWE D G. Scalable nearest neighbor algorithms for high dimensional data[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11):2227-2240.
[10] SARYAZDI S, NIKPOUR B, NEZAMABADIPOUR H. NPC: neighbors progressive competition algorithm for classification of imbalanced data sets[J]. arXiv:1711.10934(2017).
[11] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[12] JESUS M J D, VENTURA S, GARRELL J M, et al. KEEL: a software tool to assess evolutionary algorithms for data mining problems[J]. Soft Computing-A Fusion of Foundations, Methodologies and Applications, 2008, 13(3):307-318.
[13] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C] // Proceedings of the 14th International Joint Conference on Artificial Intelligence-Volume 2.[S.l.] : Morgan Kaufmann Publishers Inc. 1995.
[14] LI Y X, ZHANG X Z. Improving k nearest neighbor with exemplar generalization for imbalanced classification[M] // LI Y X, ZHANG X Z. eds. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2011: 321-332.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

面向不平衡分类的固定半径最近邻逐步竞争算法(FRNNPC)

Fixed-radius nearest neighbor progressive competition algorithm for imbalanced classification

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

多维度评价

本文评价

推荐阅读 0