基于动态API调用序列和机器学习的恶意逃避样本检测方法

doi:10.6040/j.issn.1671-9352.2.2021.117

Abstract

Abstract: This paper analyzes the evasion behavior of malicious evasion samples, summarizes the commonly used evasion API function set of malicious evasion samples, and proposes a malicious evasion sample detection method based on dynamic API call sequence and machine learning. In the feature engineering processing stage, this paper proposes an evasion API function weight measurement algorithm and optimizes word frequency processing. At the same time, our method enhances the eigenvector value of the evasion API function, and the accuracy of the method in this paper can reach 95.09% in detecting malicious evasion samples.

Key words: evasion sample, API sequence, machine learning

CLC Number:

TP393.08

ZHANG Jie, PENG Guo-jun, YANG Xiu-zhang. Malicious evasion sample detection based on dynamic API call sequence and machine learning[J].JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 85-93.

References

[1] DINABURG A, ROYAL P, SHARIF M,et al. Ether: malware analysis via hardware virtualization extensions[C] //Proceedings of the 15th ACM conference on Computer and Communications Security. [S.l.] : ACM, 2008: 51-62.
[2] ALAZAB M. Profiling and classifying the behavior of malicious codes[J]. Journal of Systems & Software, 2014, 100:91-102.
[3] ALAZAB M, VENKATARAMAN S, WATTERS P. Towards understanding malware behaviour by the extraction of API calls[C] //2010 Second Cybercrime and Trustworthy Computing Workshop. [S.l.] : IEEE, 2010: 52-59.
[4 ] CESARE S, YANG X, ZHOU W. Control flow-based malware variant detection[J]. IEEE Transactions on Dependable and Secure Computing, 2014, 11(4):307-317.
[5] GALAL H S, MAHDY Y B, ATIEA M A. Behavior-based features model for malware detection[J]. Journal of Computer Virology and Hacking Techniques, 2016, 12(2):59-67.
[6] UCCI D, ANIELLO L, BALDONI R. Survey of machine learning techniques for malware analysis[J]. Computers & Security, 2019, 81(3):123-147.
[7] KIRAT D, VIGNA G, KRUEGEL C. Barecloud: Bare-metal analysis-based evasive malware detection[C] //23rd USENIX Security Symposium. [S.l.] : IEEE, 2014: 287-301.
[8] ANONYM.VirusShare sandbox[EB/OL]. [2020-03-05]. https://virusshare.com/, 2020.
[9] ANONYM. Malware analysis sandbox[EB/OL]. [2020-08-06]. https://app.any.run/, 2020.
[10] ANONYM. Automated malware analysis Joe sandbox[EB/OL] [2020-11-05]. https://www.joesandbox.com, 2020.
[11] AFIANIAN A, NIKSEFAT S, SADEGHIYAN B, et al. Malware dynamic analysis evasion techniques: a survey[J]. ACM Computing Surveys(CSUR), 2019, 52(6):1-28.
[12] DAMODARAN A, TROIA F D, VISAGGIO C A, et al. A comparison of static, dynamic, and hybrid analysis for malware detection[J]. Journal of Computer Virology & Hacking Techniques, 2015, 13(1):1-12.
[13] EGELE M, SCHOLTE T, KIRDA E, et al. A survey on automated dynamic malware-analysis techniques and tools[J]. ACM computing surveys(CSUR), 2008, 44(2):1-42.
[14] ATTALURI S, MCGHEE S, STAMP M. Profile hidden Markov models and metamorphic virus detection[J]. Journal in Computer Virology, 2009, 5(2):151-169.
[15] DESHPANDE S, PARK Y, STAMP M. Eigenvalue analysis for metamorphic detection[J]. Journal of Computer Virology & Hacking Techniques, 2014, 10(1):53-65.
[16] SINGH T, DI TROIA F, CORRADO V A, et al. Support vector machines and malware detection[J]. Journal of Computer Virology and Hacking Techniques, 2016, 12(4):203-212.
[17] AHMED F, HAMEED H, SHAFIQ M Z, et al. Using spatio-temporal information in API calls with machine learning algorithms for malware detection[C] //Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence. [S.l.] : ACM, 2009: 55-62.
[18] ESKANDARI M, HASHEMI S. A graph mining approach for detecting unknown malwares[J]. Journal of Visual Languages & Computing, 2012, 23(3):154-162.
[19] QIAO Y, HE J, YANG J, et al. Analyzing malware by abstracting the frequent itemsets in API call sequences[C] //2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. [S.l.] : IEEE, 2013: 265-270.
[20] AIZAWA A. An information-theoretic perspective of TF-IDF measures[J]. Information Processing & Management, 2003, 39(1):45-65.
[21] ADLER J, PARMRYD I. Quantifying colocalization by correlation: the Pearson correlation coefficient is superior to the Manders overlap coefficient[J]. Cytometry Part A, 2010, 77a(8):733-742.

Related Articles 7

[1]	LI Ying, ZHANG Guo-lin. Modeling for dissolved gases concentration based on mutual information and kernel entropy component analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 43-52.
[2]	An-min ZHOU,Lei HU,Lu-ping LIU,Peng JIA,Liang LIU. Malicious Office document detection technology based on entropy time series [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(5): 1-7.
[3]	LIU Ming, ZAN Hong-ying, YUAN Hui-bin. Key sentiment sentence prediction using SVM and RNN [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 68-73.
[4]	PAN Qing-qing, ZHOU Feng, YU Zheng-tao, GUO Jian-yi, XIAN Yan-tuan. Recognition method of Vietnamese named entity based on#br# conditional random fields [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(1): 76-79.
[5]	DU Rui-ying, YANG Yong, CHEN Jing, WANG Chi-heng. An efficient network traffic classification scheme based on similarity [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 109-114.
[6]	DONG Yuan1, XU Ya-bin1,2*, LI Zhuo1,2, LI Yan-ping1. Research on spam identification based on social computing and machine learning [J]. J4, 2013, 48(7): 72-78.
[7]	HUANG Lin-sheng1, DENG Zhi-hong1,2, TANG Shi-wei1,2, WANG Wen-qing3, CHEN Ling3. A Chinese organization′s full name and matching abbreviation algorithm based on edit-distance [J]. J4, 2012, 47(5): 43-48.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Malicious evasion sample detection based on dynamic API call sequence and machine learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 7

Metrics

Comments

Recommended 0