Malicious Office document detection technology based on entropy time series

An-min ZHOU(),Lei HU,Lu-ping LIU*(),Peng JIA,Liang LIU   

  1. College of Electronics and Information, Sichuan University, Chengdu 610065, Sichuan, China
In order to detect malicious Office (*.docx, *.rtf) documents more accurately, a method based on document entropy time sequence to detect malicious Office documents is proposed. This method converts the difference between the malware and the non malicious document binary to the difference between the power spectrum of the time sequence of the file entropy, and then uses three kinds of machine learning methods, IBK, Random Committe (RC) and Random Forest (RF), to learn and detect the data respectively. The experimental results show that the accuracy of the docx format document for XML compression technology can reach 92.14%, while the accuracy of the rich text format (RTF) file can reach 98.20%.

Key words: entropy time serie, power spectrum, machine learning, malicious document detection

Document structure of docx"


Entropy time series"


Experimental steps"


Power spectrum comparison diagram"


Results of accuracy test"


Results of recall test"


Results of F value test"


Comparison of detection capability"

