JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2019, Vol. 54 ›› Issue (5): 1-7.doi: 10.6040/j.issn.1671-9352.2.2018.072

•   •     Next Articles

Malicious Office document detection technology based on entropy time series

An-min ZHOU(),Lei HU,Lu-ping LIU*(),Peng JIA,Liang LIU   

  1. College of Electronics and Information, Sichuan University, Chengdu 610065, Sichuan, China
  • Received:2018-09-20 Online:2019-05-20 Published:2019-05-09
  • Contact: Lu-ping LIU E-mail:1515742050@qq.com;529282048@qq.com
  • Supported by:
    国家重点基础研究发展规划项目计划(2017YFB0802900)

Abstract:

In order to detect malicious Office (*.docx, *.rtf) documents more accurately, a method based on document entropy time sequence to detect malicious Office documents is proposed. This method converts the difference between the malware and the non malicious document binary to the difference between the power spectrum of the time sequence of the file entropy, and then uses three kinds of machine learning methods, IBK, Random Committe (RC) and Random Forest (RF), to learn and detect the data respectively. The experimental results show that the accuracy of the docx format document for XML compression technology can reach 92.14%, while the accuracy of the rich text format (RTF) file can reach 98.20%.

Key words: entropy time serie, power spectrum, machine learning, malicious document detection

CLC Number: 

  • TP39

Fig.1

Document structure of docx"

Fig.2

Entropy time series"

Fig.3

Experimental steps"

Fig.4

Power spectrum comparison diagram"

Fig.5

Results of accuracy test"

Fig.6

Results of recall test"

Fig.7

Results of F value test"

Fig.8

Comparison of detection capability"

1 SMUTZ C, STAVROU A. Malicious PDF detection using metadata and structural features[C]//Computer Security Applications Conference. Florida: ACM, 2012: 239-248.
2 SCHRECK T, BERGER S, GOBEL J. BISSAM: automatic vulnerability identification of office documents[M]// Detection Intrusions Malware, Vulnerability Assessment Anonymous.[s.l.]: Springer, 2013:204-213.
3 CHANG C C , LIN C J . LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent System and Technology, 2011, 2 (3): 1- 27.
4 NISSIM N , COHEN A , GLEZER C , et al. Detection of malicious PDF files and directions for enhancements: a state-of-the art survey[J]. Computers and Security, 2015, 49: 246- 266.
5 MOSKOVITCH R, NISSIM N, ELOVICI Y. Malicious code detectionusing active learning[C]//Privacy, Security, and Trust in KDD. Berlin: Springer, 2009: 74-91.
6 HERBRICH R , GRAEPEL T , CAMPBELL C . Bayes point machines[J]. Journal of Machine Learning Research, 2001, 1 (1): 245- 278.
7 BAYSA D , LOW R M , STAMP M . Structural entropy and metamorphic malware[J]. Journal of Computer Virology and Hacking Techniques, 2013, 9 (4): 179- 192.
doi: 10.1007/s11416-013-0185-4
8 严承华, 程晋, 樊攀星. 基于信息熵的网络流量信息结构特征研究[J]. 信息网络安全, 2014, (3): 28- 31.
doi: 10.3969/j.issn.1671-1122.2014.03.006
YAN Chenghua , CHENG Jin , FAN Panxing . Research on the structure characteristics of network traffic information based on information entropy[J]. Journal of Information Network Security, 2014, (3): 28- 31.
doi: 10.3969/j.issn.1671-1122.2014.03.006
9 LYDA R , HAMROCK J . Using entropy analysis to find encrypted and packed malware[J]. IEEE Security and Privacy, 2007, 5 (2): 40- 45.
doi: 10.1109/MSP.2007.48
10 刘荣, 刘珩. 低信噪比下基于功率谱熵的语音端点检测算法[J]. 计算机工程与应用, 2009, 45 (33): 122- 124.
LIU Rong , LIU Heng . Speech endpoint detection algorithm based on power spectral entropy at low SNR[J]. Computer Engineering and Applications, 2009, 45 (33): 122- 124.
11 MUKHERJEE A . Bit error rate analysis using converged Welch's method for energy detection spectrum sensing in cognitive radio[J]. Journal of Engineering Science and Technology Review, 2016, 9 (5): 117- 120.
doi: 10.25103/jestr
12 NISSIM N , MOSKVITCH R , BARAD O , et al. ALDROID: efficient update of Android anti-virus software using designated active learning methods[J]. Knowledge & Information System, 2016, 49 (3): 1- 39.
13 NISSIM N, COHEN A, ELOVICI Y. Boosting the detection of malicious documents using designated active learning methods[C]//IEEE 14th International Conference on Machine Learning and Applications. USA: IEEE, 2015: 760-765.
[1] LIU Ming, ZAN Hong-ying, YUAN Hui-bin. Key sentiment sentence prediction using SVM and RNN [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 68-73.
[2] PAN Qing-qing, ZHOU Feng, YU Zheng-tao, GUO Jian-yi, XIAN Yan-tuan. Recognition method of Vietnamese named entity based on#br# conditional random fields [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(1): 76-79.
[3] DU Rui-ying, YANG Yong, CHEN Jing, WANG Chi-heng. An efficient network traffic classification scheme based on similarity [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 109-114.
[4] DONG Yuan1, XU Ya-bin1,2*, LI Zhuo1,2, LI Yan-ping1. Research on spam identification based on social computing and machine learning [J]. J4, 2013, 48(7): 72-78.
[5] HUANG Lin-sheng1, DENG Zhi-hong1,2, TANG Shi-wei1,2, WANG Wen-qing3, CHEN Ling3. A Chinese organization′s full name and matching abbreviation  algorithm based on edit-distance [J]. J4, 2012, 47(5): 43-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHAO Jun1, ZHAO Jing2, FAN Ting-jun1*, YUAN Wen-peng1,3, ZHANG Zheng1, CONG Ri-shan1. Purification and anti-tumor activity examination of water-soluble asterosaponin from Asterias rollestoni Bell[J]. J4, 2013, 48(1): 30 -35 .
[2] SUN Xiao-ting1, JIN Lan2*. Application of DOSY in oligosaccharide mixture analysis[J]. J4, 2013, 48(1): 43 -45 .
[3] LUO Si-te, LU Li-qian, CUI Ruo-fei, ZHOU Wei-wei, LI Zeng-yong*. Monte-Carlo simulation of photons transmission at alcohol wavelength in  skin tissue and design of fiber optic probe[J]. J4, 2013, 48(1): 46 -50 .
[4] XIE Tao, ZUO Ke-zheng. [J]. J4, 2013, 48(4): 95 -103 .
[5] WANG Yi ,LIU Ai-lian . Cobweb models on time scales[J]. J4, 2007, 42(7): 41 -44 .
[6] YUAN Hun-ping . Schur factorization and normal matrices factorization of row (column) symmetric matrices[J]. J4, 2007, 42(10): 123 -126 .
[7] YANG Lun, XU Zheng-gang, WANG Hui*, CHEN Qi-mei, CHEN Wei, HU Yan-xia, SHI Yuan, ZHU Hong-lei, ZENG Yong-qing*. Silence of PID1 gene expression using RNA interference in C2C12 cell line[J]. J4, 2013, 48(1): 36 -42 .
[8] MAO Ai-qin1,2, YANG Ming-jun2, 3, YU Hai-yun2, ZHANG Pin1, PAN Ren-ming1*. Study on thermal decomposition mechanism of  pentafluoroethane fire extinguishing agent[J]. J4, 2013, 48(1): 51 -55 .
[9] TANG Feng-qin1, BAI Jian-ming2. The precise large deviations for a risk model with extended negatively upper orthant dependent claim  sizes[J]. J4, 2013, 48(1): 100 -106 .
[10] Ming-Chit Liu. THE TWO GOLDBACH CONJECTURES[J]. J4, 2013, 48(2): 1 -14 .