您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2021, Vol. 56 ›› Issue (1): 91-102.doi: 10.6040/j.issn.1671-9352.4.2020.145

•   • 上一篇    下一篇

基于密度峰值和网络嵌入的重叠社区发现

张一鸣(),王国胤*(),胡军,傅顺   

  1. 重庆邮电大学计算智能重庆市重点实验室, 重庆 400065
  • 收稿日期:2020-06-15 出版日期:2021-01-01 发布日期:2021-01-05
  • 通讯作者: 王国胤 E-mail:s180201010@stu.cqupt.edu.cn;wanggy@cqupt.edu.cn
  • 作者简介:张一鸣(1993—),男,硕士研究生,研究方向为智能信息处理. E-mail:s180201010@stu.cqupt.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2017YFC0804002);国家自然科学基金资助项目(61936001);国家自然科学基金资助项目(61772096);重庆市自然科学基金资助项目(cstc2019jcyj-cxttX0002)

Overlapping community detection based on density peaks and network embedding

Yi-ming ZHANG(),Guo-yin WANG*(),Jun HU,Shun FU   

  1. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2020-06-15 Online:2021-01-01 Published:2021-01-05
  • Contact: Guo-yin WANG E-mail:s180201010@stu.cqupt.edu.cn;wanggy@cqupt.edu.cn

摘要:

密度峰值是一种基于密度的聚类算法, 该算法假设类簇中心点具有较高的密度且被密度较小的节点包围。由于图结构的性质, 密度峰值无法直接适用于网络结构, 现有的基于密度峰值的社区发现算法大部分是基于图的拓扑结构或者邻接矩阵度量节点近似度, 这种方法往往引入较大的计算复杂度。文中结合网络嵌入方法通过低维向量表示网络中的节点信息, 提出了一种基于密度峰值和网络嵌入的重叠社区发现算法(overlapping community detection based on density network embedding, OCDDNE)。该算法首先通过网络嵌入获取节点的网络结构特征, 然后基于改进的密度峰值的方法对嵌入后的节点向量进行多标签聚类, 使编码后的向量之间的结构关系得到更好的揭示, 从而发现网络中的重叠社区结构。在人工网络和真实网络的验证实验表明, 该算法可以有效的挖掘网络中的重叠社区结构, 并在结构复杂度较高的网络中优于其他算法。

关键词: 重叠社区发现, 网络嵌入, 密度峰值, 复杂网络, 隶属度

Abstract:

Density peaks is a density-based clustering algorithm. It assumes that the center points of the cluster have higher density and are surrounded by nodes with lower density. Due to the character of graph structure, density peaks cannot be directly applied to network structure, and most of density peaks based community detection algorithms based on density peaks are based on graph topology or adjacency matrix to measure node approximation, which often leads to great computational complexity. This paper proposes an overlapping community detection algorithm based on density and network embedding (OCDDNE). The proposed algorithm firstly embeds the network structure characteristics of nodes in network structure through network embedding, and then clusters the embedded node vectors based on the improved method of density peaks, so that the structural relationship between the encoded vectors can be better revealed and the overlapping communities which each node is located is determined. Experiments on synthetic networks and real networks data show that the proposed algorithm can efficiently find the overlapping community structure in networks, and it is superior to other algorithms especially in complex networks with high structural complexity.

Key words: overlapping community detection, network embedding, density peak, complex network, belonging coefficient

中图分类号: 

  • TP391

图1

karate数据集通过决策图选取中心节点示例(图中红色的节点为被选中的中心节点)"

表1

LFR基准参数"

参数 说明
N 网络节点数
k 节点平均度
max k 节点最大度
min c 社区最小节点数
max c 社区最大节点数
on 重叠社区节点数
om 重叠节点所属社区的数目
mut 网络结构混合参数
muw 边权重混合参数

表2

人工网络数据集参数设定"

ID N k max k min c max c on om mut muw
R1 500 10 30 10 30 50 2 [0.2, 0.8] 0.1
R2 1 000 10 50 20 50 100 2 [0.2, 0.8] 0.1
R3 5 000 20 100 30 100 500 2 [0.2, 0.8] 0.1
R4 500 10 30 10 30 50 [2, 8] 0.3 0.3
R5 1 000 10 50 20 50 100 [2, 8] 0.3 0.3
R6 5 000 20 100 30 100 500 [2, 8] 0.6 0.3

表3

真实数据集参数信息"

数据集名称 节点数 边数 平均度 聚集系数
karate 34 78 4.590 0.571
football 115 613 10.660 0.403
email 1 005 25 571 9.622 0.220
Pol.Book 105 441 8.400 0.487
Jazz 198 2 742 27.700 0.618
Lesmis 77 256 6.600 0.573
facebook 4 039 88 234 43.780 0.605

图2

不同混合度参数mut下的实验结果比较"

图3

不同参数om下的实验结果"

图4

真实网络数据集实验结果"

图5

真实网络中心点选取(图中红色的节点为被选中的中心节点)"

图6

不同参数λ和θ下的实验结果"

1 RODRIGUEZ A , LAIO A . Clustering by fast search and find of density peaks[J]. Science, 2014, 344 (6191): 1492- 1496.
doi: 10.1126/science.1242072
2 CUI P , WANG X , PEI J , et al. A survey on network embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31 (5): 833- 852.
3 PALLA G , DERÉNYI I , FARKAS I , et al. Uncovering the overlapping community structure of complex networks in nature and society[J]. Nature, 2005, 435 (7043): 814- 818.
doi: 10.1038/nature03607
4 RAGHAVAN U N , ALBERT R , KUMARA S . Near linear time algorithm to detect community structures in large-scale networks[J]. Physical Review E, 2007, 76 (3): 036106.
doi: 10.1103/PhysRevE.76.036106
5 GREGORY S . Finding overlapping communities in networks by label propagation[J]. New Journal of Physics, 2010, 12 (10): 103018.
doi: 10.1088/1367-2630/12/10/103018
6 LU M , ZHANG Z , QU Z , et al. LPANNI: overlapping community detection using label propagation in large-scale complex networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31 (9): 1736- 1749.
7 AHN Y Y , BAGROW J P , LEHMANN S . Link communities reveal multiscale complexity in networks[J]. Nature, 2010, 466 (7307): 761- 764.
doi: 10.1038/nature09182
8 LANCICHINETTI A , FORTUNATO S , KERTÉSZ J . Detecting the overlapping and hierarchical community structure in complex networks[J]. New Journal of Physics, 2009, 11 (3): 033015.
doi: 10.1088/1367-2630/11/3/033015
9 LANCICHINETTI A , RADICCHI F , RAMASCO J J , et al. Finding statistically significant communities in networks[J]. PloS One, 2011, 6 (4): e18961.
doi: 10.1371/journal.pone.0018961
10 WHANG J J, GLEICH D F, DHILLON I S. Overlapping community detection using seed set expansion[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York: ACM, 2013: 2099-2108.
11 BAI X , YANG P , SHI X . An overlapping community detection algorithm based on density peaks[J]. Neurocomputing, 2017, 226, 7- 15.
doi: 10.1016/j.neucom.2016.11.019
12 黄岚, 李玉, 王贵参, 等. 基于点距离和密度峰值聚类的社区发现方法[J]. 吉林大学学报(工学版), 2016, 46 (6): 2042- 2051.
HUANG Lan , LI Yu , WANG Guishen , et al. Community detection method based on vertex distance and clustering of density peaks[J]. Journal of Jilin University (Engineering and Technology Edition), 2016, 46 (6): 2042- 2051.
13 PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2014: 701-710.
14 GROVER A, LESKOVEC J. node2vec: scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2016: 855-864.
15 WANG D, CUI P, ZHU W. Structural deep network embedding[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2016: 1225-1234.
16 TANG J, QU M, WANG M, et al. Line: large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web. Switzerland: ACM, 2015: 1067-1077.
17 LI A Q, AHMED A, RAVI S, et al. Reducing the sampling complexity of topic models[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 891-900.
18 MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems. Lake Tahoe: ACM, 2013: 3111-3119.
19 RECHT B, RE C, WRIGHT S, et al. Hogwild: a lock-free approach to parallelizing stochastic gradient descent[C]//Advances in Neural Information Processing Systems. Granada: ACM, 2011: 693-701.
20 MCDAID A F, GREENE D, HURLEY N. Normalized mutual information to evaluate overlapping community finding algorithms[J]. arXiv Preprint arXiv: 1110.2515, 2011.
21 COLLINS L M , DENT C W . Omega: a general formulation of the rand index of cluster recovery suitable for non-disjoint solutions[J]. Multivariate Behavioral Research, 1988, 23 (2): 231- 242.
doi: 10.1207/s15327906mbr2302_6
22 SHEN H , CHENG X , CAI K , et al. Detect overlapping and hierarchical community structure in networks[J]. Physica A: Statistical Mechanics and Its Applications, 2009, 388 (8): 1706- 1712.
doi: 10.1016/j.physa.2008.12.021
23 HUBERT L , ARABIE P . Comparing partitions[J]. Journal of Classification, 1985, 2 (1): 193- 218.
doi: 10.1007/BF01908075
24 XIE J, SZYMANSKI B K. Towards linear time overlapping community detection in social networks[C]//Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2012: 25-36.
25 COSCIA M, ROSSETTI G, GIANNOTTI F, et al. Demon: a local-first discovery method for overlapping communities[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 615-623.
26 LANCICHINETTI A , FORTUNATO S , RADICCHI F . Benchmark graphs for testing community detection algorithms[J]. Physical Review E, 2008, 78 (4): 046110.
doi: 10.1103/PhysRevE.78.046110
27 ZACHARY W W . An information flow model for conflict and fission in small groups[J]. Journal of Anthropological Research, 1977, 33 (4): 452- 473.
doi: 10.1086/jar.33.4.3629752
28 GIRVAN M , NEWMAN M E J . Community structure in social and biological networks[J]. Proceedings of the National Academy of Sciences, 2002, 99 (12): 7821- 7826.
doi: 10.1073/pnas.122653799
29 YIN H, BENSON A R, LESKOVEC J, et al. Local higher-order graph clustering[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017: 555-564.
30 GLEISER P M , DANON L . Community structure in jazz[J]. Advances in Complex Systems, 2003, 6 (4): 565- 573.
doi: 10.1142/S0219525903001067
31 KNUTH D E . The stanford graphbase: a platform for combinatorial computing[M]. New York: ACM Press, 1993.
32 LESKOVEC J, MCAULEY J J. Learning to discover social circles in ego networks[C]//Advances in Neural Information Processing Systems. Lake Tahoe: ACM, 2012: 539-547.
[1] 许侃,刘瑞鑫,林鸿飞,刘海峰,冯娇娇,李家平,林原,徐博. 基于异质网络嵌入的学术论文推荐方法[J]. 《山东大学学报(理学版)》, 2020, 55(11): 35-45.
[2] 郝秀梅,刘纪芹. 外P-模糊集的截集与扩展粗集模型[J]. 《山东大学学报(理学版)》, 2020, 55(10): 1-6.
[3] 王鑫,左万利,朱枫彤,王英. 基于重要结点的社区发现算法[J]. 山东大学学报 (理学版), 2018, 53(11): 67-77.
[4] 魏思敏,张宪华,张祯,孟庆春,张夏然. 基于复杂网络的虚拟品牌社区意见领袖识别研究——以魅族Flyme社区为例[J]. 《山东大学学报(理学版)》, 2018, 53(11): 26-34.
[5] 王亚奇,王静. 考虑好奇心理机制的动态复杂网络谣言传播研究[J]. 山东大学学报(理学版), 2017, 52(6): 99-104.
[6] 翟鹏,李登道. 基于高斯隶属度的包容性指标模糊聚类算法[J]. 山东大学学报(理学版), 2016, 51(5): 102-105.
[7] 赵斌,何泾沙,张伊璇. 基于信息熵隶属度的决策属性权重确定方法[J]. 山东大学学报(理学版), 2016, 51(3): 86-90.
[8] 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 石长光 . Faddeev模型中的多孤立子解[J]. J4, 2007, 42(7): 38 -40 .
[2] 董炯,曹小红. 算子立方的Weyl定理及其紧摄动[J]. 山东大学学报(理学版), 2016, 51(8): 15 -21 .
[3] . 银杏叶提取物中总黄酮含量的分析方法研究[J]. J4, 2009, 44(5): 40 -44 .
[4] 刘汝军,曹玉霞,周 平 . 利用小反馈实现离散非线性混沌系统的反控制[J]. J4, 2007, 42(7): 30 -32 .
[5] 许万银. 一类拟线性Neumann问题的多重解[J]. J4, 2009, 44(10): 39 -42 .
[6] 杜瑞颖, 杨勇, 陈晶, 王持恒. 一种基于相似度的高效网络流量识别方案[J]. 山东大学学报(理学版), 2014, 49(09): 109 -114 .
[7] 郭腓望,张习勇,韩文报. 一种完全非线性函数的构造[J]. J4, 2011, 46(3): 26 -30 .
[8] 范铭, 刘均, 郑庆华, 田振洲, 庄尔悦, 刘烃. 基于栈行为动态胎记的软件抄袭检测方法[J]. 山东大学学报(理学版), 2014, 49(09): 9 -16 .
[9] 马嘉赛,张永军 . 最小方方法的一种优化方法[J]. J4, 2006, 41(3): 104 -107 .
[10] 张凤霞1,李莹1,2,郭文彬1,赵建立1. 分块Hermite阵与斜Hermite阵的最大秩与最小秩[J]. J4, 2010, 45(4): 106 -110 .