您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2022, Vol. 57 ›› Issue (7): 85-93.doi: 10.6040/j.issn.1671-9352.2.2021.117

• • 上一篇    

基于动态API调用序列和机器学习的恶意逃避样本检测方法

张杰1,2,彭国军1,2*,杨秀璋1,2   

  1. 1.空天信息安全与可信计算教育部重点实验室, 湖北 武汉 430072;2.武汉大学国家网络安全学院, 湖北 武汉 430072
  • 发布日期:2022-06-29
  • 作者简介:张杰(1997— ),男,硕士研究生,研究方向为网络安全、软件安全. E-mail:jason1314@whu.edu.cn*通信作者简介:彭国军(1979— ),男,博士,教授,研究方向为网络与信息系统安全. E-mail:guojpeng@whu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62172308,U1626107,61972297,62172144)

Malicious evasion sample detection based on dynamic API call sequence and machine learning

ZHANG Jie1,2, PENG Guo-jun1,2*, YANG Xiu-zhang1,2   

  1. 1. Key Laboratory of Space Information Security and Trusted Computing, Ministry of Education, Wuhan 430072, Hubei, China;
    2. School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, Hubei, China
  • Published:2022-06-29

摘要: 针对恶意逃避样本的逃避行为进行分析,归纳并总结了恶意逃避样本常用的逃避API函数集,提出了一种基于动态API调用序列和机器学习的恶意逃避样本检测方法。在特征工程处理阶段,提出了逃避API函数权重衡量算法,并通过优化词频处理来增强逃避API函数的特征向量值,最终本文方法检测恶意逃避样本的准确率可达95.09%。

关键词: 逃避样本, API调用序列, 机器学习

Abstract: This paper analyzes the evasion behavior of malicious evasion samples, summarizes the commonly used evasion API function set of malicious evasion samples, and proposes a malicious evasion sample detection method based on dynamic API call sequence and machine learning. In the feature engineering processing stage, this paper proposes an evasion API function weight measurement algorithm and optimizes word frequency processing. At the same time, our method enhances the eigenvector value of the evasion API function, and the accuracy of the method in this paper can reach 95.09% in detecting malicious evasion samples.

Key words: evasion sample, API sequence, machine learning

中图分类号: 

  • TP393.08
[1] DINABURG A, ROYAL P, SHARIF M,et al. Ether: malware analysis via hardware virtualization extensions[C] //Proceedings of the 15th ACM conference on Computer and Communications Security. [S.l.] : ACM, 2008: 51-62.
[2] ALAZAB M. Profiling and classifying the behavior of malicious codes[J]. Journal of Systems & Software, 2014, 100:91-102.
[3] ALAZAB M, VENKATARAMAN S, WATTERS P. Towards understanding malware behaviour by the extraction of API calls[C] //2010 Second Cybercrime and Trustworthy Computing Workshop. [S.l.] : IEEE, 2010: 52-59.
[4 ] CESARE S, YANG X, ZHOU W. Control flow-based malware variant detection[J]. IEEE Transactions on Dependable and Secure Computing, 2014, 11(4):307-317.
[5] GALAL H S, MAHDY Y B, ATIEA M A. Behavior-based features model for malware detection[J]. Journal of Computer Virology and Hacking Techniques, 2016, 12(2):59-67.
[6] UCCI D, ANIELLO L, BALDONI R. Survey of machine learning techniques for malware analysis[J]. Computers & Security, 2019, 81(3):123-147.
[7] KIRAT D, VIGNA G, KRUEGEL C. Barecloud: Bare-metal analysis-based evasive malware detection[C] //23rd USENIX Security Symposium. [S.l.] : IEEE, 2014: 287-301.
[8] ANONYM.VirusShare sandbox[EB/OL]. [2020-03-05]. https://virusshare.com/, 2020.
[9] ANONYM. Malware analysis sandbox[EB/OL]. [2020-08-06]. https://app.any.run/, 2020.
[10] ANONYM. Automated malware analysis Joe sandbox[EB/OL] [2020-11-05]. https://www.joesandbox.com, 2020.
[11] AFIANIAN A, NIKSEFAT S, SADEGHIYAN B, et al. Malware dynamic analysis evasion techniques: a survey[J]. ACM Computing Surveys(CSUR), 2019, 52(6):1-28.
[12] DAMODARAN A, TROIA F D, VISAGGIO C A, et al. A comparison of static, dynamic, and hybrid analysis for malware detection[J]. Journal of Computer Virology & Hacking Techniques, 2015, 13(1):1-12.
[13] EGELE M, SCHOLTE T, KIRDA E, et al. A survey on automated dynamic malware-analysis techniques and tools[J]. ACM computing surveys(CSUR), 2008, 44(2):1-42.
[14] ATTALURI S, MCGHEE S, STAMP M. Profile hidden Markov models and metamorphic virus detection[J]. Journal in Computer Virology, 2009, 5(2):151-169.
[15] DESHPANDE S, PARK Y, STAMP M. Eigenvalue analysis for metamorphic detection[J]. Journal of Computer Virology & Hacking Techniques, 2014, 10(1):53-65.
[16] SINGH T, DI TROIA F, CORRADO V A, et al. Support vector machines and malware detection[J]. Journal of Computer Virology and Hacking Techniques, 2016, 12(4):203-212.
[17] AHMED F, HAMEED H, SHAFIQ M Z, et al. Using spatio-temporal information in API calls with machine learning algorithms for malware detection[C] //Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence. [S.l.] : ACM, 2009: 55-62.
[18] ESKANDARI M, HASHEMI S. A graph mining approach for detecting unknown malwares[J]. Journal of Visual Languages & Computing, 2012, 23(3):154-162.
[19] QIAO Y, HE J, YANG J, et al. Analyzing malware by abstracting the frequent itemsets in API call sequences[C] //2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. [S.l.] : IEEE, 2013: 265-270.
[20] AIZAWA A. An information-theoretic perspective of TF-IDF measures[J]. Information Processing & Management, 2003, 39(1):45-65.
[21] ADLER J, PARMRYD I. Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Manders overlap coefficient[J]. Cytometry Part A, 2010, 77a(8):733-742.
[1] 李颖,张国林. 互信息和核熵成分分析的油中溶解气体浓度建模[J]. 《山东大学学报(理学版)》, 2022, 57(7): 43-52.
[2] 周安民,户磊,刘露平,贾鹏,刘亮. 基于熵时间序列的恶意Office文档检测技术[J]. 《山东大学学报(理学版)》, 2019, 54(5): 1-7.
[3] 刘铭, 昝红英, 原慧斌. 基于SVM与RNN的文本情感关键句判定与抽取[J]. 山东大学学报(理学版), 2014, 49(11): 68-73.
[4] 潘清清,周枫,余正涛,郭剑毅,线岩团. 基于条件随机场的越南语命名实体识别方法[J]. 山东大学学报(理学版), 2014, 49(1): 76-79.
[5] 杜瑞颖, 杨勇, 陈晶, 王持恒. 一种基于相似度的高效网络流量识别方案[J]. 山东大学学报(理学版), 2014, 49(09): 109-114.
[6] 董源1,徐雅斌1,2*,李卓1,2,李艳平1. 基于社会计算和机器学习的垃圾邮件识别方法的研究[J]. J4, 2013, 48(7): 72-78.
[7] 黄林晟1,邓志鸿1,2,唐世渭1,2,王文清3,陈凌3. 基于编辑距离的中文组织机构名简称-全称匹配算法[J]. J4, 2012, 47(5): 43-48.
[8] 唐都钰1,王大亮2,赵凯2,秦兵1,刘挺1. 面向汽车领域的软文识别研究[J]. J4, 2012, 47(3): 43-46.
[9] 黄贤立,罗冬梅. 倾向性文本迁移学习中的特征重要性研究[J]. J4, 2010, 45(7): 13-17.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!