基于密度峰值和网络嵌入的重叠社区发现

doi:10.6040/j.issn.1671-9352.4.2020.145

摘要/Abstract

摘要：

密度峰值是一种基于密度的聚类算法, 该算法假设类簇中心点具有较高的密度且被密度较小的节点包围。由于图结构的性质, 密度峰值无法直接适用于网络结构, 现有的基于密度峰值的社区发现算法大部分是基于图的拓扑结构或者邻接矩阵度量节点近似度, 这种方法往往引入较大的计算复杂度。文中结合网络嵌入方法通过低维向量表示网络中的节点信息, 提出了一种基于密度峰值和网络嵌入的重叠社区发现算法(overlapping community detection based on density network embedding, OCDDNE)。该算法首先通过网络嵌入获取节点的网络结构特征, 然后基于改进的密度峰值的方法对嵌入后的节点向量进行多标签聚类, 使编码后的向量之间的结构关系得到更好的揭示, 从而发现网络中的重叠社区结构。在人工网络和真实网络的验证实验表明, 该算法可以有效的挖掘网络中的重叠社区结构, 并在结构复杂度较高的网络中优于其他算法。

关键词: 重叠社区发现, 网络嵌入, 密度峰值, 复杂网络, 隶属度

Abstract:

Density peaks is a density-based clustering algorithm. It assumes that the center points of the cluster have higher density and are surrounded by nodes with lower density. Due to the character of graph structure, density peaks cannot be directly applied to network structure, and most of density peaks based community detection algorithms based on density peaks are based on graph topology or adjacency matrix to measure node approximation, which often leads to great computational complexity. This paper proposes an overlapping community detection algorithm based on density and network embedding (OCDDNE). The proposed algorithm firstly embeds the network structure characteristics of nodes in network structure through network embedding, and then clusters the embedded node vectors based on the improved method of density peaks, so that the structural relationship between the encoded vectors can be better revealed and the overlapping communities which each node is located is determined. Experiments on synthetic networks and real networks data show that the proposed algorithm can efficiently find the overlapping community structure in networks, and it is superior to other algorithms especially in complex networks with high structural complexity.

Key words: overlapping community detection, network embedding, density peak, complex network, belonging coefficient

中图分类号:

TP391

张一鸣,王国胤,胡军,傅顺. 基于密度峰值和网络嵌入的重叠社区发现[J]. 《山东大学学报(理学版)》, 2021, 56(1): 91-102.

Yi-ming ZHANG,Guo-yin WANG,Jun HU,Shun FU. Overlapping community detection based on density peaks and network embedding[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(1): 91-102.

图/表 9

图1

表1

表2

表3

图2

图3

图4

图5

图6

参考文献 32

1	RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks[J]. Science, 2014, 344 (6191): 1492- 1496. doi: 10.1126/science.1242072
2	CUI P , WANG X , PEI J , et al. A survey on network embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31 (5): 833- 852.
3	PALLA G , DERÉNYI I , FARKAS I , et al. Uncovering the overlapping community structure of complex networks in nature and society[J]. Nature, 2005, 435 (7043): 814- 818. doi: 10.1038/nature03607
4	RAGHAVAN U N , ALBERT R , KUMARA S . Near linear time algorithm to detect community structures in large-scale networks[J]. Physical Review E, 2007, 76 (3): 036106. doi: 10.1103/PhysRevE.76.036106
5	GREGORY S . Finding overlapping communities in networks by label propagation[J]. New Journal of Physics, 2010, 12 (10): 103018. doi: 10.1088/1367-2630/12/10/103018
6	LU M , ZHANG Z , QU Z , et al. LPANNI: overlapping community detection using label propagation in large-scale complex networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31 (9): 1736- 1749.
7	AHN Y Y , BAGROW J P , LEHMANN S . Link communities reveal multiscale complexity in networks[J]. Nature, 2010, 466 (7307): 761- 764. doi: 10.1038/nature09182
8	LANCICHINETTI A , FORTUNATO S , KERTÉSZ J . Detecting the overlapping and hierarchical community structure in complex networks[J]. New Journal of Physics, 2009, 11 (3): 033015. doi: 10.1088/1367-2630/11/3/033015
9	LANCICHINETTI A , RADICCHI F , RAMASCO J J , et al. Finding statistically significant communities in networks[J]. PloS One, 2011, 6 (4): e18961. doi: 10.1371/journal.pone.0018961
10	WHANG J J, GLEICH D F, DHILLON I S. Overlapping community detection using seed set expansion[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York: ACM, 2013: 2099-2108.
11	BAI X , YANG P , SHI X . An overlapping community detection algorithm based on density peaks[J]. Neurocomputing, 2017, 226, 7- 15. doi: 10.1016/j.neucom.2016.11.019
12	黄岚, 李玉, 王贵参, 等. 基于点距离和密度峰值聚类的社区发现方法[J]. 吉林大学学报(工学版), 2016, 46 (6): 2042- 2051.
	HUANG Lan , LI Yu , WANG Guishen , et al. Community detection method based on vertex distance and clustering of density peaks[J]. Journal of Jilin University (Engineering and Technology Edition), 2016, 46 (6): 2042- 2051.
13	PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2014: 701-710.
14	GROVER A, LESKOVEC J. node2vec: scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2016: 855-864.
15	WANG D, CUI P, ZHU W. Structural deep network embedding[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2016: 1225-1234.
16	TANG J, QU M, WANG M, et al. Line: large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web. Switzerland: ACM, 2015: 1067-1077.
17	LI A Q, AHMED A, RAVI S, et al. Reducing the sampling complexity of topic models[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 891-900.
18	MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems. Lake Tahoe: ACM, 2013: 3111-3119.
19	RECHT B, RE C, WRIGHT S, et al. Hogwild: a lock-free approach to parallelizing stochastic gradient descent[C]//Advances in Neural Information Processing Systems. Granada: ACM, 2011: 693-701.
20	MCDAID A F, GREENE D, HURLEY N. Normalized mutual information to evaluate overlapping community finding algorithms[J]. arXiv Preprint arXiv: 1110.2515, 2011.
21	COLLINS L M , DENT C W . Omega: a general formulation of the rand index of cluster recovery suitable for non-disjoint solutions[J]. Multivariate Behavioral Research, 1988, 23 (2): 231- 242. doi: 10.1207/s15327906mbr2302_6
22	SHEN H , CHENG X , CAI K , et al. Detect overlapping and hierarchical community structure in networks[J]. Physica A: Statistical Mechanics and Its Applications, 2009, 388 (8): 1706- 1712. doi: 10.1016/j.physa.2008.12.021
23	HUBERT L , ARABIE P . Comparing partitions[J]. Journal of Classification, 1985, 2 (1): 193- 218. doi: 10.1007/BF01908075
24	XIE J, SZYMANSKI B K. Towards linear time overlapping community detection in social networks[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2012: 25-36.
25	COSCIA M, ROSSETTI G, GIANNOTTI F, et al. Demon: a local-first discovery method for overlapping communities[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 615-623.
26	LANCICHINETTI A , FORTUNATO S , RADICCHI F . Benchmark graphs for testing community detection algorithms[J]. Physical Review E, 2008, 78 (4): 046110. doi: 10.1103/PhysRevE.78.046110
27	ZACHARY W W . An information flow model for conflict and fission in small groups[J]. Journal of Anthropological Research, 1977, 33 (4): 452- 473. doi: 10.1086/jar.33.4.3629752
28	GIRVAN M , NEWMAN M E J . Community structure in social and biological networks[J]. Proceedings of the National Academy of Sciences, 2002, 99 (12): 7821- 7826. doi: 10.1073/pnas.122653799
29	YIN H, BENSON A R, LESKOVEC J, et al. Local higher-order graph clustering[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017: 555-564.
30	GLEISER P M , DANON L . Community structure in jazz[J]. Advances in Complex Systems, 2003, 6 (4): 565- 573. doi: 10.1142/S0219525903001067
31	KNUTH D E . The stanford graphbase: a platform for combinatorial computing[M]. New York: ACM Press, 1993.
32	LESKOVEC J, MCAULEY J J. Learning to discover social circles in ego networks[C]//Advances in Neural Information Processing Systems. Lake Tahoe: ACM, 2012: 539-547.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

参数	说明
N	网络节点数
k	节点平均度
max k	节点最大度
min c	社区最小节点数
max c	社区最大节点数
on	重叠社区节点数
om	重叠节点所属社区的数目
mut	网络结构混合参数
muw	边权重混合参数

ID	N	k	max k	min c	max c	on	om	mut	muw
R1	500	10	30	10	30	50	2	[0.2, 0.8]	0.1
R2	1 000	10	50	20	50	100	2	[0.2, 0.8]	0.1
R3	5 000	20	100	30	100	500	2	[0.2, 0.8]	0.1
R4	500	10	30	10	30	50	[2, 8]	0.3	0.3
R5	1 000	10	50	20	50	100	[2, 8]	0.3	0.3
R6	5 000	20	100	30	100	500	[2, 8]	0.6	0.3

数据集名称	节点数	边数	平均度	聚集系数
karate	34	78	4.590	0.571
football	115	613	10.660	0.403
email	1 005	25 571	9.622	0.220
Pol.Book	105	441	8.400	0.487
Jazz	198	2 742	27.700	0.618
Lesmis	77	256	6.600	0.573
facebook	4 039	88 234	43.780	0.605

[1]	许侃,刘瑞鑫,林鸿飞,刘海峰,冯娇娇,李家平,林原,徐博. 基于异质网络嵌入的学术论文推荐方法[J]. 《山东大学学报(理学版)》, 2020, 55(11): 35-45.
[2]	郝秀梅,刘纪芹. 外P-模糊集的截集与扩展粗集模型[J]. 《山东大学学报(理学版)》, 2020, 55(10): 1-6.
[3]	王鑫,左万利,朱枫彤,王英. 基于重要结点的社区发现算法[J]. 山东大学学报 (理学版), 2018, 53(11): 67-77.
[4]	魏思敏,张宪华,张祯,孟庆春,张夏然. 基于复杂网络的虚拟品牌社区意见领袖识别研究——以魅族Flyme社区为例[J]. 《山东大学学报(理学版)》, 2018, 53(11): 26-34.
[5]	王亚奇,王静. 考虑好奇心理机制的动态复杂网络谣言传播研究[J]. 山东大学学报（理学版）, 2017, 52(6): 99-104.
[6]	翟鹏,李登道. 基于高斯隶属度的包容性指标模糊聚类算法[J]. 山东大学学报（理学版）, 2016, 51(5): 102-105.
[7]	赵斌,何泾沙,张伊璇. 基于信息熵隶属度的决策属性权重确定方法[J]. 山东大学学报（理学版）, 2016, 51(3): 86-90.
[8]	孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报（理学版）, 2014, 49(11): 43-50.