《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (5): 1-7.doi: 10.6040/j.issn.1671-9352.2.2018.072
• • 下一篇
An-min ZHOU(),Lei HU,Lu-ping LIU*(),Peng JIA,Liang LIU
摘要:
为了更加准确地检测恶意Office(*.docx、*.rtf)文档,提出了一种基于文档熵时间序列对恶意Office文档进行检测的方法。该方法将恶意与非恶意文档二进制之间的差异转换为文件熵时间序列功率谱之间的差异性,然后采用IBK、random committe(RC)和random forest(RF)3种机器学习方法分别对数据进行学习和检测。实验结果显示,针对基于XML压缩技术的docx格式文档的准确率可以达到92.14%,而针对富文本格式(rtf)文件的准确率可以达到98.20%。
中图分类号:
1 | SMUTZ C, STAVROU A. Malicious PDF detection using metadata and structural features[C]//Computer Security Applications Conference. Florida: ACM, 2012: 239-248. |
2 | SCHRECK T, BERGER S, GOBEL J. BISSAM: automatic vulnerability identification of office documents[M]// Detection Intrusions Malware, Vulnerability Assessment Anonymous.[s.l.]: Springer, 2013:204-213. |
3 | CHANG C C , LIN C J . LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent System and Technology, 2011, 2 (3): 1- 27. |
4 | NISSIM N , COHEN A , GLEZER C , et al. Detection of malicious PDF files and directions for enhancements: a state-of-the art survey[J]. Computers and Security, 2015, 49: 246- 266. |
5 | MOSKOVITCH R, NISSIM N, ELOVICI Y. Malicious code detectionusing active learning[C]//Privacy, Security, and Trust in KDD. Berlin: Springer, 2009: 74-91. |
6 | HERBRICH R , GRAEPEL T , CAMPBELL C . Bayes point machines[J]. Journal of Machine Learning Research, 2001, 1 (1): 245- 278. |
7 |
BAYSA D , LOW R M , STAMP M . Structural entropy and metamorphic malware[J]. Journal of Computer Virology and Hacking Techniques, 2013, 9 (4): 179- 192.
doi: 10.1007/s11416-013-0185-4 |
8 |
严承华, 程晋, 樊攀星. 基于信息熵的网络流量信息结构特征研究[J]. 信息网络安全, 2014, (3): 28- 31.
doi: 10.3969/j.issn.1671-1122.2014.03.006 |
YAN Chenghua , CHENG Jin , FAN Panxing . Research on the structure characteristics of network traffic information based on information entropy[J]. Journal of Information Network Security, 2014, (3): 28- 31.
doi: 10.3969/j.issn.1671-1122.2014.03.006 |
|
9 |
LYDA R , HAMROCK J . Using entropy analysis to find encrypted and packed malware[J]. IEEE Security and Privacy, 2007, 5 (2): 40- 45.
doi: 10.1109/MSP.2007.48 |
10 | 刘荣, 刘珩. 低信噪比下基于功率谱熵的语音端点检测算法[J]. 计算机工程与应用, 2009, 45 (33): 122- 124. |
LIU Rong , LIU Heng . Speech endpoint detection algorithm based on power spectral entropy at low SNR[J]. Computer Engineering and Applications, 2009, 45 (33): 122- 124. | |
11 |
MUKHERJEE A . Bit error rate analysis using converged Welch's method for energy detection spectrum sensing in cognitive radio[J]. Journal of Engineering Science and Technology Review, 2016, 9 (5): 117- 120.
doi: 10.25103/jestr |
12 | NISSIM N , MOSKVITCH R , BARAD O , et al. ALDROID: efficient update of Android anti-virus software using designated active learning methods[J]. Knowledge & Information System, 2016, 49 (3): 1- 39. |
13 | NISSIM N, COHEN A, ELOVICI Y. Boosting the detection of malicious documents using designated active learning methods[C]//IEEE 14th International Conference on Machine Learning and Applications. USA: IEEE, 2015: 760-765. |
[1] | 刘铭, 昝红英, 原慧斌. 基于SVM与RNN的文本情感关键句判定与抽取[J]. 山东大学学报(理学版), 2014, 49(11): 68-73. |
[2] | 潘清清,周枫,余正涛,郭剑毅,线岩团. 基于条件随机场的越南语命名实体识别方法[J]. 山东大学学报(理学版), 2014, 49(1): 76-79. |
[3] | 杜瑞颖, 杨勇, 陈晶, 王持恒. 一种基于相似度的高效网络流量识别方案[J]. 山东大学学报(理学版), 2014, 49(09): 109-114. |
[4] | 董源1,徐雅斌1,2*,李卓1,2,李艳平1. 基于社会计算和机器学习的垃圾邮件识别方法的研究[J]. J4, 2013, 48(7): 72-78. |
[5] | 黄林晟1,邓志鸿1,2,唐世渭1,2,王文清3,陈凌3. 基于编辑距离的中文组织机构名简称-全称匹配算法[J]. J4, 2012, 47(5): 43-48. |
[6] | 唐都钰1,王大亮2,赵凯2,秦兵1,刘挺1. 面向汽车领域的软文识别研究[J]. J4, 2012, 47(3): 43-46. |
[7] | 黄贤立,罗冬梅. 倾向性文本迁移学习中的特征重要性研究[J]. J4, 2010, 45(7): 13-17. |
|