基于连续特征的未知协议消息聚类算法

doi:10.6040/j.issn.1671-9352.2.2018.136

Abstract

Abstract:

Clustering of unknown protocol message sequences is the basis for analyzing the protocol format. Based on the traditional K-means algorithm, this paper proposed a continuous-features-based unknown feature message clustering algorithm, which combines the continuity of protocol format field with string matching method. Firstly, coarsely cluster the measured data sets based on the continuity of the protocol format field, extract the initial clustering center of the K-means algorithm, and then iteratively process the data using the message distance and the iterative algorithm with improved convergence function to further cluster the message. Experiments show that compared with the traditional K-means algorithm, the proposed new method improves the accuracy of clustering by 17.58% and reduces the number of iterations by about 58.27%. Compared with EM algorithm and DBSCAN algorithm, the clustering accuracy and time are significantly improved.

Key words: unknown protocol, protocol reverse analysis, K-means algorithm, initial clustering centers

CLC Number:

TP181

Zheng-yu LU,Guang-song LI,Ying-zhu SHEN,Bin ZHANG. Unknown protocol message clustering algorithm based on continuous features[J].JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(5): 37-43.

Figures/Tables 6

Fig.1

Fig.2

Fig.3

Fig.4

Table 1

Fig.5

References 19

1	吴礼发, 洪征, 潘璠. 网络协议逆向分析及应用[M]. 北京: 国防工业出版社, 2016.
	WU Lifa , HONG Zheng , PAN Pan . Network protocol reverse analysis and application[M]. Beijing: National Defense Industry Press, 2016.
2	MACQUEEN J. Some methods for classification and analysis of multivariate observations[J]. Proc of Berkeley Symposium on Mathematical Statistics & Probability. Berkley: California Press, 1965: 281-297.
3	NEEDLEMAN S B , WUNSCH C D . A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of Molecular Biology, 1970, 48 (3): 443- 453. doi: 10.1016/0022-2836(70)90057-4
4	CUI W, KANNAN J, WANG H J. Discoverer: automatic protocol reverse engineering from network traces[C]//Proceedings of 16th USENIX Security Symposium, Boston: Usenix Security, 2007.
5	HUANG Q, LEE P P C, ZHANG Z B. Exploiting intra-packet dependency for fine-grained protocol format inference[C]//2015 IFIP Networking Conference Toulouse: IEEE, 2015: 1-9.
6	SIJA B D , GOO Y H , SHIM K S , et al. A survey of automatic protocol reverse engineering approaches, methods, and tools on the inputs and outputs view[J]. Security and Communication Networks, 2018, 2018: 1- 17.
7	何超, 刘方, 曾曦, 等. 针对未知协议消息序列的聚类分析实现[J]. 通信技术, 2017, 50 (2): 277- 286. doi: 10.3969/j.issn.1002-0802.2017.02.014
	HE Chao , LIU Fang , ZENG Xi , et al. Clustering analysis of unknown protocol message sequence[J]. Communications Technology, 2017, 50 (2): 277- 286. doi: 10.3969/j.issn.1002-0802.2017.02.014
8	GUSFIELD Dan . Algorithms on stings, trees, and sequences[J]. ACM SIGACT News, 1997, 28 (4): 41- 60. doi: 10.1145/270563
9	YANG Q , WU X D . 10 challenging problems in data mining research[J]. International Journal of Information Technology & Decision Making, 2006, 5 (4): 597- 604.
10	SMYTH P . Clustering sequences with hidden markov models[J]. NIPS, 1997, 9: 648- 654.
11	OREBAUGH A , RAMIREZ G , BURKE J , et al. Wireshark & Ethereal network protocol analyzer toolkit[M]. Cambridge: Elsevier, 2007.
12	JOHNSON L . Weavings Women Doing Theology in Oceania[M]. Fiji: Wcavers, 2003.
13	王维彬, 钟润添. 一种基于贪心EM算法学习GMM的聚类算法[J]. 计算机仿真, 2007, 24 (2): 65- 68. doi: 10.3969/j.issn.1006-9348.2007.02.018
	WANG Weibin , ZHONG Runtian . A clustering algorithm based on greedy EM algorithm learning GMM[J]. Computer Simulation, 2007, 24 (2): 65- 68. doi: 10.3969/j.issn.1006-9348.2007.02.018
14	ESTER M, KRIEGEL HP, XU X. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise[C]//International Conference on Knowledge Discovery and Data Mining. Palo Alto: AAAI Press, 1996: 226-231.
15	BAI L , CHENG X Q , LIANG J Y , et al. Fast density clustering strategies based on the K-means algorithm[J]. Pattern Recognition, 2017, 71: 375- 386. doi: 10.1016/j.patcog.2017.06.023
16	TÎRNĂUCĂ C , GÓMEZ-PÉREZ D , BALCÁZAR J L , et al. Global optimality in k-means clustering[J]. Information Sciences, 2018, 439/440: 79- 94. doi: 10.1016/j.ins.2018.02.001
17	NAZNIN F , SARKER R , ESSAM D . Progressive alignment method using genetic algorithm for multiple sequence alignment[J]. IEEE Transactions on Evolutionary Computation, 2012, 16 (5): 615- 631. doi: 10.1109/TEVC.2011.2162849
18	LEE C T , PENG S L . A pairwise alignment algorithm for long sequences of high similarity[M]. Singapore: Springer, 2017: 279- 287.
19	WIDE MAWI Working Group[EB/OL].[2018-5-20]. http://mawi.wide.ad.jp/.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 10

[1]	LUO Si-te, LU Li-qian, CUI Ruo-fei, ZHOU Wei-wei, LI Zeng-yong*. Monte-Carlo simulation of photons transmission at alcohol wavelength in skin tissue and design of fiber optic probe[J]. J4, 2013, 48(1): 46 -50 .
[2]	SHI Kai-quan. P-information law intelligent fusion and soft information #br# image intelligent generation[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(04): 1 -17 .
[3]	PEI Sheng-yu，ZHOU Yong-quan. A mult-objective particle swarm optimization algorithm based on the chaotic mutation[J]. J4, 2010, 45(7): 18 -23 .
[4]	ZHAO Jun1, ZHAO Jing2, FAN Ting-jun1*, YUAN Wen-peng1,3, ZHANG Zheng1, CONG Ri-shan1. Purification and anti-tumor activity examination of water-soluble asterosaponin from Asterias rollestoni Bell[J]. J4, 2013, 48(1): 30 -35 .
[5]	YANG Yong-wei1, 2, HE Peng-fei2, LI Yi-jun2,3. On strict filters of BL-algebras#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 63 -67 .
[6]	MENG Xiang-bo1, ZHANG Li-dong1, DU Zi-ping2. Investment and reinsurance strategy for insurers under #br# mean-variance criterion with jumps#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 36 -40 .
[7]	Ming-Chit Liu. THE TWO GOLDBACH CONJECTURES[J]. J4, 2013, 48(2): 1 -14 .
[8]	TANG Xiao-hong1, HU Wen-xiao2*, WEI Yan-feng2, JIANG Xi-long2, ZHANG Jing-ying2, SHAO Xue-dong3. Screening and biological characteristics studies of wide wine-making yeasts[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 12 -17 .
[9]	WANG Kai-rong, GAO Pei-ting. Two mixed conjugate gradient methods based on DY[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(6): 16 -23 .
[10]	ZHANG Ming-ming, QIN Yong-bin. A non-deterministic finite automata minimization method based on preorder relation[J]. J4, 2010, 45(7): 34 -38 .

算法名称	NWK	K-means	EM	DBSCAN
噪声分离功能	有	无	无	有

Unknown protocol message clustering algorithm based on continuous features

RichHTML

PDF (PC)

Abstract

Cite this article

share this article

Figures/Tables 6

References 19

Related Articles 1

Metrics

Comments

Recommended 10