您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2016, Vol. 51 ›› Issue (11): 50-57.doi: 10.6040/j.issn.1671-9352.2.2015.273

• • 上一篇    下一篇

不均衡数据集下的入侵检测

杜红乐,张燕,张林   

  1. 商洛学院数学与计算机应用学院, 陕西 商洛 726000
  • 收稿日期:2015-09-21 出版日期:2016-11-20 发布日期:2016-11-22
  • 作者简介:杜红乐(1979— ),男,硕士,讲师,研究方向为机器学习、数据挖掘. E-mail:dhl5597@126.com
  • 基金资助:
    陕西省自然科学基础研究计划资助项目(2015JM6347);陕西省教育厅科技计划项目(15JK1218);商洛学院科学与技术研究项目(15sky010)

Intrusion detection on imbalanced dataset

DU Hong-le, ZHANG Yan, ZHANG Lin   

  1. School of Mathematics and Computer Application, Shangluo University, Shangluo 726000, Shaanxi, China
  • Received:2015-09-21 Online:2016-11-20 Published:2016-11-22

摘要: 在直推式支持向量机(transductive support vector machine, TSVM)中,迭代过程中样本标注错误会导致错误传递,影响下一次迭代中样本标注准确度,使得错误不断地被积累,造成最终分类超平面的偏移。在不均衡数据集下,传统支持向量机(support vector machine, SVM)对样本分类的错误率较高,导致TSVM在每次迭代中标注样本准确度不高。针对此,本文提出一种不均衡数据集下的直推式学习算法,该算法依据各类支持向量的密度分布关系动态计算各类的惩罚因子,提高每次迭代中样本标注的准确度,算法在继承渐进赋值和动态调整规则的基础上,减少分类超平面的偏移。最后,在KDD CUP99数据集上的仿真实验结果表明该算法能够提高TSVM在不均衡数据下的分类性能,降低误警率和漏报率。

关键词: 支持向量机, 半监督学习, 直推式学习, 入侵检测, 不均衡数据集

Abstract: In transductive support vector machine, sample labeling error will result in error propagation in the iterative process. It affects the accuracy of sample labeling in the next iteration and makes mistakes constantly being accumulated. Eventually leading to classification hyperplane offset. Under imbalanced dataset, there is higher classification error rate of traditional SVM that causes the labeling error rate in each iterative for TSVM. Therefore, the algorithm of TSVM for imbalanced dataset is proposed in this paper. We dynamic calculates the penalty factor of every class according to the relationship of sample density of every class to improve the accuracy of labeling sample in each iterative. The algorithm inherits its rules of progressive labeling and dynamic adjusting, and reduces the offset of the classification hyperplane. Finally, experiment results with KDD CUP99 dataset show the algorithm can improve the classification performance at imbalanced dataset, especially for the minority class samples.

Key words: support vector machine, transductive learning, semi-supervised learning, imbalanced dataset, intrusion detection

中图分类号: 

  • TP301
[1] VAPNIK V N. Statistical learning theory[M]. New York: John Wiley and Sons, 1998.
[2] 陈毅松,汪国平,董士海.基于支持向量机的渐进直推式分类学习算法[J].软件学报,2003,14(3):451-460. CHEN Yisong, WANG Guoping, DONG Shihai. A progressive transductive inference algorithm based on support vector machine[J]. Journal of Software, 2003, 14(3):451-460.
[3] 王安娜,李云路,赵锋云, 等.一种新的半监督直推式支持向量机分类算法[J].仪器仪表学报,2011,32(7):1546-1550. WANG Anna, LI Yunlu, ZHAO Fengyun, et al. Novel semi-supervised classification algorithm based on TSVM[J]. Chinese Journal of Scientific Instrument, 2011, 32(7):1546-1550.
[4] 廖东平,姜斌,魏玺章, 等.一种快速的渐进直推式支持向量机分类学习算法[J].系统工程与电子技术,2007,29(1):87-91. LIAO Dongping, JIANG Bin, WEI Xizhang, et al. Fast learning algorithm with progressive transductive support vector machine[J]. Systems Engineering and Electronics, 2007, 29(1):87-91.
[5] 彭新俊,王翼飞.双模糊渐进直推式支持向量机算法[J].模式识别与人工智能,2009,22(4):560-566. PENG Xinjun, WANG Yifei. A bi-fuzzy progressive transductive support vector machine algorithm[J]. Pattern Recognition and Artificial Intelligence, 2009, 22(4):560-566.
[6] 薛贞霞,刘三阳,刘万里.改进的直推式支持向量机算法[J].系统工程理论与实践,2009,29(5):142-148. XUE Zhenxia, LIU Sanyang, LIU Wanli. Improved learning algorithm with transductive support vector machines[J]. Systems Engineering Theory and Practice, 2009, 29(5):142-148.
[7] 齐芳,冯昕,徐其江.基于人工鱼群优化的直推式支持向量机分类算法[J].计算机应用与软件, 2013,30(3):294-296. QI Fang, FENG Xin, XU Qijiang. Transductive support vector machine classification algorithm based on artificial fish school optimisation[J].Computer Applications and Software, 2013, 30(3):294-296.
[8] 丁要军,蔡皖东.采用两阶段策略模型(KTSVM)的P2P流量识别方法[J].西安交通大学学报, 2012,46(2):45-50,129. DING Yaojun, CAI Wandong. P2P traffic identification via k-means based transductive support vetor machine[J]. Journal of Xian Jiaotong University, 2012, 46(2):45-50,129.
[9] 艾解清,高济,彭艳斌,等.基于直推式支持向量机的协商决策模型[J].浙江大学学报(工学版),2012,46(6):967-973,994. AI Jieqing, GAO Ji, PENG Yanbin, et al. Negotiation decision model based on transductive support vector machine[J]. Journal of Zhejiang University(Engineering Science), 2012, 46(6):967-973, 994.
[10] 杜红乐.基于核空间中K-近邻的不均衡数据算法[J].计算机科学与探索, 2015,9(7):869-876. DU Hongle. Algorithm for imbalanced dataset based on K-nearest neighbor in kernel space[J]. Journal of Frontiers of Computer Science and Technology, 2015, 9(7):869-876.
[11] 张建明,孙春梅,闫婷.基于自适应SVM的半监督主动学习视频标注[J].计算机工程,2013,39(8):190-195. ZHANG Jianming, SUN Chunmei, YAN Ting. Video annotation for semi-supervised active learning based on adaptive SVM[J]. Computer Engineering, 2013, 39(8):190-195.
[12] 金鑫,李玉鉴.不均衡支持向量机的惩罚因子选择方法[J].计算机工程与应用,2011,47(33):129-133. JIN Xin, LI Yujian. Error-cost selection for biased support vector machines[J]. Computer Engineering and Applications, 2011, 47(33):129-133.
[13] CHANG C C, LIN C J. LIBSVM: a library for support vector machines[J]. Acm Transactions on Intelligent Systems and Technology, 2011, 2(3):389-396.
[1] 张鹏,王素格,李德玉,王杰. 一种基于启发式规则的半监督垃圾评论分类方法[J]. 山东大学学报(理学版), 2017, 52(7): 44-51.
[2] 彭秋芳,刘洋. 基于SVM的电子商务行为的性别判断[J]. 山东大学学报(理学版), 2016, 51(7): 74-80.
[3] 苏丰龙,谢庆华,黄清泉,邱继远,岳振军. 基于直推式学习的半监督属性抽取[J]. 山东大学学报(理学版), 2016, 51(3): 111-115.
[4] 杜瑞颖, 杨勇, 陈晶, 王持恒. 一种基于相似度的高效网络流量识别方案[J]. 山东大学学报(理学版), 2014, 49(09): 109-114.
[5] 刘飚1,2,陈春萍3,封化民1,3,李洋3. 基于Fisher准则的SVM参数选择算法[J]. J4, 2012, 47(7): 50-54.
[6] 曹林林1,2,张化祥1,2*,王至超1,2. 一种基于信息熵数据修剪的支持向量机:EB-SVM[J]. J4, 2012, 47(5): 59-62.
[7] 陈佩剑1,杨岳湘2,唐川2. 基于信任度量机制的分布式入侵检测系统[J]. J4, 2011, 46(9): 77-80.
[8] 姜家涛,刘志杰*,谢晓尧. 基于模糊神经网络集成的入侵检测模型[J]. J4, 2011, 46(9): 95-98.
[9] 张宁仙,郭敏*,马苗. 基于AR模型和SVM的果蝇振翅声分类[J]. J4, 2011, 46(7): 83-86.
[10] 宋玉丹,王士同*. 基于特征缺省的最小类内方差支持向量机[J]. J4, 2010, 45(7): 102-107.
[11] 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
[12] 杨冰,王士同*. 基于公共矢量的总间隔v最小类内方差支持向量机在噪音人脸图像分类中的应用[J]. J4, 2010, 45(11): 5-11.
[13] 吕良 杨波 陈贞翔. 网络安全防护系统的研究与设计[J]. J4, 2009, 44(9): 47-51.
[14] 曹 鸿,董守斌,张 凌 . 基于加权策略的SVM多元分类器[J]. J4, 2006, 41(3): 66-69 .
[15] 林煜明,李 优 . 基于SVM的句子组块识别[J]. J4, 2006, 41(3): 112-115 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!