您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2021, Vol. 56 ›› Issue (1): 91-102.doi: 10.6040/j.issn.1671-9352.4.2020.145

• • 上一篇    

基于密度峰值和网络嵌入的重叠社区发现

张一鸣,王国胤*,胡军,傅顺   

  1. 重庆邮电大学计算智能重庆市重点实验室, 重庆 400065
  • 发布日期:2021-01-05
  • 作者简介:张一鸣(1993— ),男,硕士研究生,研究方向为智能信息处理. E-mail:s180201010@stu.cqupt.edu.cn*通信作者简介:王国胤(1970— ),男,教授,博导,研究方向为粒计算、认知计算、智能信息处理. E-mail:wanggy@cqupt.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2017YFC0804002);国家自然科学基金资助项目(61936001,61772096);重庆市自然科学基金资助项目(cstc2019jcyj-cxttX0002)

Overlapping community detection based on density peaks and network embedding

ZHANG Yi-ming, WANG Guo-yin*, HU Jun, FU Shun   

  1. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Published:2021-01-05

摘要: 密度峰值是一种基于密度的聚类算法,该算法假设类簇中心点具有较高的密度且被密度较小的节点包围。由于图结构的性质,密度峰值无法直接适用于网络结构,现有的基于密度峰值的社区发现算法大部分是基于图的拓扑结构或者邻接矩阵度量节点近似度,这种方法往往引入较大的计算复杂度。文中结合网络嵌入方法通过低维向量表示网络中的节点信息,提出了一种基于密度峰值和网络嵌入的重叠社区发现算法(overlapping community detection based on density network embedding, OCDDNE)。该算法首先通过网络嵌入获取节点的网络结构特征,然后基于改进的密度峰值的方法对嵌入后的节点向量进行多标签聚类,使编码后的向量之间的结构关系得到更好的揭示,从而发现网络中的重叠社区结构。在人工网络和真实网络的验证实验表明,该算法可以有效的挖掘网络中的重叠社区结构,并在结构复杂度较高的网络中优于其他算法。

关键词: 重叠社区发现, 网络嵌入, 密度峰值, 复杂网络, 隶属度

Abstract: Density peaks is a density-based clustering algorithm. It assumes that the center points of the cluster have higher density and are surrounded by nodes with lower density. Due to the character of graph structure, density peaks cannot be directly applied to network structure, and most of density peaks based community detection algorithms based on density peaks are based on graph topology or adjacency matrix to measure node approximation, which often leads to great computational complexity. This paper proposes an overlapping community detection algorithm based on density and network embedding(OCDDNE). The proposed algorithm firstly embeds the network structure characteristics of nodes in network structure through network embedding, and then clusters the embedded node vectors based on the improved method of density peaks, so that the structural relationship between the encoded vectors can be better revealed and the overlapping communities which each node is located is determined. Experiments on synthetic networks and real networks data show that the proposed algorithm can efficiently find the overlapping community structure in networks, and it is superior to other algorithms especially in complex networks with high structural complexity.

Key words: overlapping community detection, network embedding, density peak, complex network, belonging coefficient

中图分类号: 

  • TP391
[1] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[2] CUI P, WANG X, PEI J, et al. A survey on network embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31(5):833-852.
[3] PALLA G, DERÉNYI I, FARKAS I, et al. Uncovering the overlapping community structure of complex networks in nature and society[J]. Nature, 2005, 435(7043):814-818.
[4] RAGHAVAN U N, ALBERT R, KUMARA S. Near linear time algorithm to detect community structures in large-scale networks[J]. Physical Review E, 2007, 76(3):036106.
[5] GREGORY S. Finding overlapping communities in networks by label propagation[J]. New Journal of Physics, 2010, 12(10):103018.
[6] LU M, ZHANG Z, QU Z, et al. LPANNI: overlapping community detection using label propagation in large-scale complex networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31(9):1736-1749.
[7] AHN Y Y, BAGROW J P, LEHMANN S. Link communities reveal multiscale complexity in networks[J]. Nature, 2010, 466(7307):761-764.
[8] LANCICHINETTI A, FORTUNATO S, KERTÉSZ J. Detecting the overlapping and hierarchical community structure in complex networks[J]. New Journal of Physics, 2009, 11(3):033015.
[9] LANCICHINETTI A, RADICCHI F, RAMASCO J J, et al. Finding statistically significant communities in networks[J]. PloS One, 2011, 6(4):e18961.
[10] WHANG J J, GLEICH D F, DHILLON I S. Overlapping community detection using seed set expansion[C] //Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York: ACM, 2013: 2099-2108.
[11] BAI X, YANG P, SHI X. An overlapping community detection algorithm based on density peaks[J]. Neurocomputing, 2017, 226:7-15.
[12] 黄岚, 李玉, 王贵参, 等. 基于点距离和密度峰值聚类的社区发现方法[J]. 吉林大学学报(工学版), 2016, 46(6):2042-2051. HUANG Lan, LI Yu, WANG Guishen, et al. Community detection method based on vertex distance and clustering of density peaks[J]. Journal of Jilin University(Engineering and Technology Edition), 2016, 46(6):2042-2051.
[13] PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C] //Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2014: 701-710.
[14] GROVER A, LESKOVEC J. node2vec: scalable feature learning for networks[C] //Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2016: 855-864.
[15] WANG D, CUI P, ZHU W. Structural deep network embedding[C] //Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data mining. New York: ACM, 2016: 1225-1234.
[16] TANG J, QU M, WANG M, et al. Line: large-scale information network embedding[C] //Proceedings of the 24th International Conference on World Wide Web. Switzerland: ACM, 2015: 1067-1077.
[17] LI A Q, AHMED A, RAVI S, et al. Reducing the sampling complexity of topic models[C] //Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 891-900.
[18] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C] //Advances in Neural Information Processing Systems. Lake Tahoe: ACM, 2013: 3111-3119.
[19] RECHT B, RE C, WRIGHT S, et al. Hogwild: a lock-free approach to parallelizing stochastic gradient descent[C] //Advances in Neural Information Processing Systems. Granada: ACM, 2011: 693-701.
[20] MCDAID A F, GREENE D, HURLEY N. Normalized mutual information to evaluate overlapping community finding algorithms[J]. arXiv Preprint arXiv:1110.2515, 2011.
[21] COLLINS L M, DENT C W. Omega: a general formulation of the rand index of cluster recovery suitable for non-disjoint solutions[J]. Multivariate Behavioral Research, 1988, 23(2):231-242.
[22] SHEN H, CHENG X, CAI K, et al. Detect overlapping and hierarchical community structure in networks[J]. Physica A: Statistical Mechanics and Its Applications, 2009, 388(8):1706-1712.
[23] HUBERT L, ARABIE P. Comparing partitions[J]. Journal of Classification, 1985, 2(1):193-218.
[24] XIE J, SZYMANSKI B K. Towards linear time overlapping community detection in social networks[C] //Pacific-Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer, 2012: 25-36.
[25] COSCIA M, ROSSETTI G, GIANNOTTI F, et al. Demon: a local-first discovery method for overlapping communities[C] //Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2012: 615-623.
[26] LANCICHINETTI A, FORTUNATO S, RADICCHI F. Benchmark graphs for testing community detection algorithms[J]. Physical Review E, 2008, 78(4):046110.
[27] ZACHARY W W. An information flow model for conflict and fission in small groups[J]. Journal of Anthropological Research, 1977, 33(4):452-473.
[28] GIRVAN M, NEWMAN M E J. Community structure in social and biological networks[J]. Proceedings of the National Academy of Sciences, 2002, 99(12):7821-7826.
[29] YIN H, BENSON A R, LESKOVEC J, et al. Local higher-order graph clustering[C] //Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017: 555-564.
[30] GLEISER P M, DANON L. Community structure in jazz[J]. Advances in Complex Systems, 2003, 6(4):565-573.
[31] KNUTH D E. The stanford graphbase: a platform for combinatorial computing[M]. New York: ACM Press, 1993.
[32] LESKOVEC J, MCAULEY J J. Learning to discover social circles in ego networks[C] //Advances in Neural Information Processing Systems. Lake Tahoe: ACM, 2012: 539-547.
[1] 许侃,刘瑞鑫,林鸿飞,刘海峰,冯娇娇,李家平,林原,徐博. 基于异质网络嵌入的学术论文推荐方法[J]. 《山东大学学报(理学版)》, 2020, 55(11): 35-45.
[2] 郝秀梅,刘纪芹. 外P-模糊集的截集与扩展粗集模型[J]. 《山东大学学报(理学版)》, 2020, 55(10): 1-6.
[3] 王鑫,左万利,朱枫彤,王英. 基于重要结点的社区发现算法[J]. 山东大学学报 (理学版), 2018, 53(11): 67-77.
[4] 魏思敏,张宪华,张祯,孟庆春,张夏然. 基于复杂网络的虚拟品牌社区意见领袖识别研究——以魅族Flyme社区为例[J]. 《山东大学学报(理学版)》, 2018, 53(11): 26-34.
[5] 王亚奇,王静. 考虑好奇心理机制的动态复杂网络谣言传播研究[J]. 山东大学学报(理学版), 2017, 52(6): 99-104.
[6] 翟鹏,李登道. 基于高斯隶属度的包容性指标模糊聚类算法[J]. 山东大学学报(理学版), 2016, 51(5): 102-105.
[7] 赵斌,何泾沙,张伊璇. 基于信息熵隶属度的决策属性权重确定方法[J]. 山东大学学报(理学版), 2016, 51(3): 86-90.
[8] 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!