基于熵时间序列的恶意Office文档检测技术

doi:10.6040/j.issn.1671-9352.2.2018.072

《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (5): 1-7.doi: 10.6040/j.issn.1671-9352.2.2018.072

• • 下一篇

基于熵时间序列的恶意Office文档检测技术

周安民(),户磊,刘露平*(),贾鹏,刘亮

四川大学电子信息学院, 四川成都 610065

收稿日期:2018-09-20 出版日期:2019-05-20 发布日期:2019-05-09
通讯作者: 刘露平 E-mail:1515742050@qq.com;529282048@qq.com
作者简介:周安民(1963—),男,研究员,研究方向为安全防御与管理. E-mail:1515742050@qq.com
基金资助:
国家重点基础研究发展规划项目计划(2017YFB0802900)

Malicious Office document detection technology based on entropy time series

An-min ZHOU(),Lei HU,Lu-ping LIU*(),Peng JIA,Liang LIU

College of Electronics and Information, Sichuan University, Chengdu 610065, Sichuan, China

Received:2018-09-20 Online:2019-05-20 Published:2019-05-09
Contact: Lu-ping LIU E-mail:1515742050@qq.com;529282048@qq.com
Supported by:
国家重点基础研究发展规划项目计划(2017YFB0802900)

摘要/Abstract

摘要：

为了更加准确地检测恶意Office(*.docx、*.rtf)文档,提出了一种基于文档熵时间序列对恶意Office文档进行检测的方法。该方法将恶意与非恶意文档二进制之间的差异转换为文件熵时间序列功率谱之间的差异性,然后采用IBK、random committe(RC)和random forest(RF)3种机器学习方法分别对数据进行学习和检测。实验结果显示,针对基于XML压缩技术的docx格式文档的准确率可以达到92.14%,而针对富文本格式(rtf)文件的准确率可以达到98.20%。

关键词: 熵时间序列, 功率谱, 机器学习, 恶意文档检测

Abstract:

In order to detect malicious Office (*.docx, *.rtf) documents more accurately, a method based on document entropy time sequence to detect malicious Office documents is proposed. This method converts the difference between the malware and the non malicious document binary to the difference between the power spectrum of the time sequence of the file entropy, and then uses three kinds of machine learning methods, IBK, Random Committe (RC) and Random Forest (RF), to learn and detect the data respectively. The experimental results show that the accuracy of the docx format document for XML compression technology can reach 92.14%, while the accuracy of the rich text format (RTF) file can reach 98.20%.

Key words: entropy time serie, power spectrum, machine learning, malicious document detection

中图分类号:

TP39

周安民,户磊,刘露平,贾鹏,刘亮. 基于熵时间序列的恶意Office文档检测技术[J]. 《山东大学学报(理学版)》, 2019, 54(5): 1-7.

An-min ZHOU,Lei HU,Lu-ping LIU,Peng JIA,Liang LIU. Malicious Office document detection technology based on entropy time series[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(5): 1-7.

图/表 8

图1

图2

图3

图4

图5

图6

图7

图8

参考文献 13

1	SMUTZ C, STAVROU A. Malicious PDF detection using metadata and structural features[C]//Computer Security Applications Conference. Florida: ACM, 2012: 239-248.
2	SCHRECK T, BERGER S, GOBEL J. BISSAM: automatic vulnerability identification of office documents[M]// Detection Intrusions Malware, Vulnerability Assessment Anonymous.[s.l.]: Springer, 2013:204-213.
3	CHANG C C , LIN C J . LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent System and Technology, 2011, 2 (3): 1- 27.
4	NISSIM N , COHEN A , GLEZER C , et al. Detection of malicious PDF files and directions for enhancements: a state-of-the art survey[J]. Computers and Security, 2015, 49: 246- 266.
5	MOSKOVITCH R, NISSIM N, ELOVICI Y. Malicious code detectionusing active learning[C]//Privacy, Security, and Trust in KDD. Berlin: Springer, 2009: 74-91.
6	HERBRICH R , GRAEPEL T , CAMPBELL C . Bayes point machines[J]. Journal of Machine Learning Research, 2001, 1 (1): 245- 278.
7	BAYSA D , LOW R M , STAMP M . Structural entropy and metamorphic malware[J]. Journal of Computer Virology and Hacking Techniques, 2013, 9 (4): 179- 192. doi: 10.1007/s11416-013-0185-4
8	严承华, 程晋, 樊攀星. 基于信息熵的网络流量信息结构特征研究[J]. 信息网络安全, 2014, (3): 28- 31. doi: 10.3969/j.issn.1671-1122.2014.03.006
	YAN Chenghua , CHENG Jin , FAN Panxing . Research on the structure characteristics of network traffic information based on information entropy[J]. Journal of Information Network Security, 2014, (3): 28- 31. doi: 10.3969/j.issn.1671-1122.2014.03.006
9	LYDA R , HAMROCK J . Using entropy analysis to find encrypted and packed malware[J]. IEEE Security and Privacy, 2007, 5 (2): 40- 45. doi: 10.1109/MSP.2007.48
10	刘荣, 刘珩. 低信噪比下基于功率谱熵的语音端点检测算法[J]. 计算机工程与应用, 2009, 45 (33): 122- 124.
	LIU Rong , LIU Heng . Speech endpoint detection algorithm based on power spectral entropy at low SNR[J]. Computer Engineering and Applications, 2009, 45 (33): 122- 124.
11	MUKHERJEE A . Bit error rate analysis using converged Welch's method for energy detection spectrum sensing in cognitive radio[J]. Journal of Engineering Science and Technology Review, 2016, 9 (5): 117- 120. doi: 10.25103/jestr
12	NISSIM N , MOSKVITCH R , BARAD O , et al. ALDROID: efficient update of Android anti-virus software using designated active learning methods[J]. Knowledge & Information System, 2016, 49 (3): 1- 39.
13	NISSIM N, COHEN A, ELOVICI Y. Boosting the detection of malicious documents using designated active learning methods[C]//IEEE 14th International Conference on Machine Learning and Applications. USA: IEEE, 2015: 760-765.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

[1]	刘铭, 昝红英, 原慧斌. 基于SVM与RNN的文本情感关键句判定与抽取[J]. 山东大学学报（理学版）, 2014, 49(11): 68-73.
[2]	潘清清,周枫,余正涛,郭剑毅,线岩团. 基于条件随机场的越南语命名实体识别方法[J]. 山东大学学报（理学版）, 2014, 49(1): 76-79.
[3]	杜瑞颖, 杨勇, 陈晶, 王持恒. 一种基于相似度的高效网络流量识别方案[J]. 山东大学学报（理学版）, 2014, 49(09): 109-114.
[4]	董源1,徐雅斌1,2*,李卓1,2,李艳平1. 基于社会计算和机器学习的垃圾邮件识别方法的研究[J]. J4, 2013, 48(7): 72-78.
[5]	黄林晟1,邓志鸿1,2,唐世渭1,2,王文清3,陈凌3. 基于编辑距离的中文组织机构名简称-全称匹配算法[J]. J4, 2012, 47(5): 43-48.
[6]	唐都钰1,王大亮2,赵凯2,秦兵1,刘挺1. 面向汽车领域的软文识别研究[J]. J4, 2012, 47(3): 43-46.
[7]	黄贤立，罗冬梅. 倾向性文本迁移学习中的特征重要性研究[J]. J4, 2010, 45(7): 13-17.

基于熵时间序列的恶意Office文档检测技术

Malicious Office document detection technology based on entropy time series

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 13

相关文章 7

多维度评价

本文评价

推荐阅读 10