一种基于深度学习的快速DGA域名分类算法

doi:10.6040/j.issn.1671-9352.0.2019.249

《山东大学学报(理学版)》 ›› 2019, Vol. 54 ›› Issue (7): 106-112.doi: 10.6040/j.issn.1671-9352.0.2019.249

• • 上一篇

一种基于深度学习的快速DGA域名分类算法

刘洋¹,赵科军^1,2*,葛连升¹,刘恒³

1.山东大学信息化工作办公室, 山东济南 250100;2.山东大学计算机科学与技术学院, 山东青岛 266237;3.中电长城网际系统应用有限公司, 北京 102209

发布日期:2019-06-27
作者简介:刘洋(1976— ),女,硕士,工程师,研究方向为数据管理和网络安全. E-mail:yang@sdu.edu.cn*通信作者简介:赵科军(1981— ),男,博士研究生,工程师,研究方向为网络安全和机器学习. E-mail:zhaokejun@sdu.edu.cn
基金资助:
十三五国家重点研发计划(2017YFB0803004);赛尔网络下一代互联网技术创新项目(NGII20150412)

A fast DGA domain detection algorithm based on deep learning

LIU Yang¹, ZHAO Ke-jun^1,2*, GE Lian-sheng¹, LIU Heng³

1. Informatization Office, Shandong University, Jinan 250100, Shandong, China;
2. School of Computer Science and Technology, Jinan 266237, Shandong, China;
3. Zhongdian Great Wall Internetworking System Application Co., Ltd, Beijing 102209, China

Published:2019-06-27

摘要/Abstract

摘要： 提出了一种基于深度学习的CNN-LSTM-Concat快速DGA域名分类算法,使用多层一维卷积网络对域名字符进行序列化处理,LSTM网络层用于强化获取字符间长距离依赖关系。通过将LSTM的多序列输入转化为单向量输入,在保证检测性能的前提下,能够大幅提高训练和检测速度。实验证明,我们的方法对DGA域名分类的准率在公开数据集上达到98.32%。同时,在准确率相比主流的LSTM方法更高的情况下,检测时间比LSTM方法快6.41倍。

关键词: 域名生成算法, 卷积网络, LSTM

Abstract: A CNN-LSTM-Concat fast DGA domain classification algorithm based on deep learning is proposed. The multi-layer one-dimensional convolution networks are used to serialize domain name characters. The LSTM network layer is used to enhance the long-distance dependence between characters. By converting the multi-sequence input of LSTM into a single vector input, the training and detection speed can be greatly improved under the premise of ensuring the detection performance. Experiments show that our method has a precision of 98.32% for DGA domain classification using public datasets. At the same time, the detection time is 6.41 times faster than the LSTM method when the accuracy is higher than the epidemic LSTM methods.

Key words: DGA, CNN, LSTM

中图分类号:

TP391

刘洋,赵科军,葛连升,刘恒. 一种基于深度学习的快速DGA域名分类算法[J]. 《山东大学学报(理学版)》, 2019, 54(7): 106-112.

LIU Yang, ZHAO Ke-jun, GE Lian-sheng, LIU Heng. A fast DGA domain detection algorithm based on deep learning[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 106-112.

参考文献

[1] STONE-GROSS B, COVA M, GILBERT B, et al. Analysis of a botnet takeover[J]. IEEE Security & Privacy Magazine, 2011, 9(1):64-72.
[2] CHOI H, LEE H, LEE H, et al. Botnet detection by monitoring group activities in DNS traffic[C] // 7th IEEE International Conference on Computer and Information Technology(CIT 2007).[S.l.] :[s.n.] , 2007: 715-720.
[3] BILGE L, SEN S, BALZAROTTI D, et al. Exposure: a passive DNS analysis service to detect and report malicious domains[J]. ACM Trans Inf Syst Secur, 2014, 16(4):14:1-14:28.
[4] KWON J, LEE J, LEE H, et al. PsyBoG: a scalable botnet detection method for large-scale DNS traffic[J]. Computer Networks, 2016, 97:48-73.
[5] YADAV S, REDDY A L N. Winning with DNS failures: strategies for faster botnet detection[C] // Security and Privacy in Communication Networks. Berlin: Springer, 2011: 446-459.
[6] YADAV S, REDDY A K K, REDDY A L N, et al. Detecting algorithmically generated malicious domain names[C] // Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. New York: ACM, 2010: 48-61.
[7] SCHIAVONI S, MAGGI F, CAVALLARO L, et al. Phoenix: DGA-based botnet tracking and intelligence[C] // Detection of Intrusions and Malware, and Vulnerability Assessment. Cham: Springer, 2014: 192-211.
[8] 张维维, 龚俭, 刘茜等. 基于词素特征的轻量级域名检测算法[J]. 软件学报, 2016, 27(9):2348-2364. ZHANG Weiwei, GONG Jian, LIU Qian, et al. Lightweight domain name detection algorithm based on morpheme features[J]. Journal of Software, 2016, 27(9):2348-2364.
[9] TRUONG D-T, CHENG G. Detecting domain-flux botnet based on DNS traffic features in managed network[J]. Security and Communication Networks, 2016, 9(14):2338-2347.
[10] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[11] WOODBRIDGE J, ANDERSON H S, AHUJA A, et al. Predicting domain generation algorithms with long short-term memory networks[J/OL]. arXiv: 1611.00791 [cs] , 2016.
[12] HUANG J, WANG P, ZANG T, et al. Detecting domain generation algorithms with convolutional neural language models[C] // 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering(TrustCom/BigDataSE). [S.l.] :[s.n.] , 2018: 1360-1367.
[13] ZHAUNIAROVICH Y, KHALIL I, YU T, et al. A survey on malicious domains detection through DNS data analysis[J]. ACM Comput Surv, 2018, 51(4):67:1-67:36.
[14] YANG L, LIU G, ZHAI J, et al. A novel detection method for word-based DGA[C] // SUN X, PAN Z, BERTINO E. Cloud Computing and Security. [S.l.] : Springer International Publishing, 2018: 472-483.
[15] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[16] KIM Y. Convolutional neural networks for sentence classification[J/OL]. arXiv: 1408.5882 [cs] , 2014.
[17] KARIM F, MAJUMDAR S, DARABI H, et al. LSTM fully convolutional networks for time series classification[J]. IEEE Access, 2018, 6:1662-1669.
[18] KÜHRER M, ROSSOW C, HOLZ T. Paint it black: evaluating the effectiveness of malware blacklists[G] // STAVROU A, BOS H, PORTOKALIDIS G. Research in Attacks, Intrusions and Defenses. Cham: Springer International Publishing, 2014, 8688:1-21.
[19] LEE J, KWON J, SHIN H J, et al. Tracking multiple C&C botnets by analyzing DNS traffic[C] // 2010 6th IEEE Workshop on Secure Network Protocols. [S.l.] :[s.n.] , 2010: 67-72.
[20] 周昌令, 陈恺, 公绪晓等. 基于Passive DNS的速变域名检测[J]. 北京大学学报(自然科学版), 2016, 52(03):396-402. ZHOU Changling, CHEN Kai, GONG Xuxiao, et al. Detection of fast-flux domains based on passive DNS analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2016, 52(03):396-402.
[21] GRILL M, NIKOLAEV I, VALEROS V, et al. Detecting DGA malware using NetFlow[C] // 2015 IFIP/IEEE International Symposium on Integrated Network Management(IM). [S.l.] :[s.n.] , 2015: 1304-1309.
[22] ANTONAKAKIS M, PERDISCI R, NADJI Y, et al. From throw-away traffic to bots: detecting the rise of DGA-based malware[C] // Proceedings of the 21st USENIX Conference on Security Symposium. Berkeley: USENIX Association, 2012: 24-24.
[23] YADAV S, REDDY A K K, REDDY A L N, et al. Detecting algorithmically generated domain-flux attacks with DNS traffic analysis[J]. IEEE/ACM Transactions on Networking, 2012, 20(5):1663-1677.
[24] TONG V, NGUYEN G. A method for detecting DGA botnet based on semantic and cluster analysis[C] // Proceedings of the Seventh Symposium on Information and Communication Technology-SoICT’16. Ho Chi Minh City, Viet Nam: ACM Press, 2016: 272-277.
[25] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[26] CURTIN R R, GARDNER A B, GRZONKOWSKI S, et al. Detecting DGA domains with recurrent neural networks and side information[J/OL]. [2018-10-04].https://arxiv.org/abs/1810.02023v1
[27] KOH J J, RHODES B. Inline detection of domain generation algorithms with context-sensitive word embeddings[C] // 2018 IEEE International Conference on Big Data(Big Data). [S.l.] :[s.n.] , 2018: 2966-2971.
[28] TRAN D, MAC H, TONG V, et al. A LSTM based framework for handling multiclass imbalance in DGA botnet detection[J]. Neurocomputing, 2018, 275:2401-2413.
[29] ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[G] // CORTES C, LAWRENCE N D, LEE D D. Advances in Neural Information Processing Systems 28. [S.l.] :[s.n.] , 2015: 649-657.
[30] KINGMA D P, BA J. Adam: a method for stochastic optimization[J/OL]. arXiv:1412.6980 [cs] , 2014.
[31] OSINT feeds from bambenek consulting[EB/OL]. https://scikit-learn.org/stable/index.html
[32] Keras[EB/OL]. https://github.com/fchollet/keras
[33] OSINT feeds from bambenek consulting[EB/OL]. [2019-04-20]. http://osint.bambenekconsulting.com/feeds/.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

一种基于深度学习的快速DGA域名分类算法

A fast DGA domain detection algorithm based on deep learning

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

多维度评价

本文评价

推荐阅读 0

[1]	严倩,王礼敏,李寿山,周国栋. 结合新闻和评论文本的读者情绪分类方法[J]. 山东大学学报（理学版）, 2018, 53(9): 35-39.
[2]	杨艳,徐冰,杨沐昀,赵晶晶. 一种基于联合深度学习模型的情感分类方法[J]. 山东大学学报（理学版）, 2017, 52(9): 19-25.
[3]	陈敬,李寿山,周国栋. 基于双通道LSTM的用户年龄识别方法[J]. 山东大学学报（理学版）, 2017, 52(7): 91-96.