您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (3): 1-13.doi: 10.6040/j.issn.1671-9352.7.2023.787

•   •    下一篇

基于元路径属性融合的异质网络表示学习

王静红1,2,3(),吴芝冰1,黄鹏1,杨家腾1,李笔4,*()   

  1. 1. 河北师范大学计算机与网络空间安全学院, 河北 石家庄 050024
    2. 河北省网络与信息安全重点实验室(河北师范大学), 河北 石家庄 050024
    3. 供应链大数据分析与数据安全河北省工程研究中心(河北师范大学), 河北 石家庄 050024
    4. 河北师范大学商学院, 河北 石家庄 050024
  • 收稿日期:2023-04-29 出版日期:2024-03-20 发布日期:2024-03-06
  • 通讯作者: 李笔 E-mail:wangjinghong@126.com;libilb@263.net
  • 作者简介:王静红(1967—),女,教授,硕士生导师,博士,研究方向为人工智能、数据挖掘、模式识别等. E-mail: wangjinghong@126.com
  • 基金资助:
    河北省自然科学基金资助项目(F2021205014);河北省高等学校科学技术研究项目(ZD2022139);中央引导地方科技发展资金项目(226Z1808G)

Heterogeneous network representation learning based on metapath attribute fusion

Jinghong WANG1,2,3(),Zhibing WU1,Peng HUANG1,Jiateng YANG1,Bi LI4,*()   

  1. 1. School of Computer and Cyberspace Security, Hebei Normal University, Shijiazhuang 050024, Hebei, China
    2. Hebei Key Laboratory of Network and Information Security (Hebei Normal University), Shijiazhuang 050024, Hebei, China
    3. Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics and Security (Hebei Normal University), Shijiazhuang 050024, Hebei, China
    4. School of Business, Hebei Normal University, Shijiazhuang 050024, Hebei, China
  • Received:2023-04-29 Online:2024-03-20 Published:2024-03-06
  • Contact: Bi LI E-mail:wangjinghong@126.com;libilb@263.net

摘要:

针对信息网络的表示学习进行研究, 提出了一种基于元路径信息融合的异质图神经网络(metapath attribute fusion graph neural network, MAFGNN), 通过在异质网络中引入元路径之前将目标节点的邻居信息包括元路径信息融入到节点中, 实现目标节点和邻居信息的融合。该方法首先将不同类型的节点属性特征进行维度转换便于后续的融合操作, 通过计算目标节点和邻居节点权重值完成目标节点信息的融合操作。然后根据特定元路径对目标节点进行融合, 最后在不同元路径间实现不同语义信息的融合操作。在多个异质信息数据集上进行实验表明, MAFGNN模型在处理异质网络节点嵌入方面相比于最先进的基准实验有最好的性能和更加准确的预测结果。

关键词: 元路径, 异质信息网络, 异质图嵌入, 信息融合, 注意力机制

Abstract:

Focusing on the research on representation learning of information networks, a metapath attribute fusion graph neural network (MAFGNN) based on metapath information fusion is proposed, which is to integrate the neighbor information of the target node, including the metapath information, into the node before introducing the metapath in the heterogeneous network to achieve the fusion of target node and neighbor information. This method first converts the attribute features of different types of nodes into dimensions to facilitate subsequent fusion operations. The fusion operation of target node information is completed by calculating the weight values of target nodes and neighbor nodes. Then target nodes are fused according to specific metapaths, and finally different semantic information is fused between different metapaths. Experiments on multiple heterogeneous information datasets show that the MAFGNN model has the best performance and more accurate prediction results than the most advanced benchmark experiments in dealing with heterogeneous network node embedding.

Key words: metapath, heterogeneous information network, heterogeneous graph embedding, information fusion, attention mechanism

中图分类号: 

  • TP181

图1

异质信息网络和元路径"

表1

本文使用各种符号汇总"

符号 解释
G G=(V, E)异质图信息网络
VG 异质网络中的节点集、边集
P 元路径
GP 基于元路径P的子图
hh′ 初始节点特征向量、转换后节点特征向量
p 基于元路径P的路径实例
W 特定节点类型转换矩阵
ei, jP 基于元路径P的节点对(i, j)重要性系数
hnodeP 元路径P的节点级注意力向量
Zv 语义层级节点嵌入向量

图2

MAFGNN模型示意图"

图3

基于不同的元路径得到的网络结构"

图4

基于不同元路径目标节点的邻居节点"

表2

MAFGNN算法过程"

算法1 基于元路径与属性融合的异质网络表示学习
输入:异质信息网络G={V, E},节点特征h=(h1, h2, …, hN),元路径集合{P1, P2, …, Pn},节点类型A={A1, A2, …, A|A|};
输出:节点最终表示Z
1for v in V do //对所有目标节点进行操作。
2通过公式(1)将目标节点和目标节点的邻居节点进行特征维度转换得到转换后的向量表示h′vh′u;
3通过公式(2)和(3)计算转换后的目标节点和邻居节点之间的权重系数并得到输出结果:$\boldsymbol{h}=a_{v, u} \boldsymbol{h}_v^{\prime}=a_{v, u}\left(\boldsymbol{h}_1, \boldsymbol{h}_2, \cdots, \boldsymbol{h}_N\right)$;
4end for//终止循环。
5for {P1, P2, …, Pn} in P do
6  for v in V do
7    通过公式(4)计算基于元路径Pi的邻居节点重要性Φi, jP;
8    通过公式(5)目标节点基于元路径的节点嵌入hnodeP;
9    if注意力头数K>1 do
10      公式(7)计算嵌入方式
11  end for//终止循环
12end for//终止循环
13通过公式(8)得到每一条元路径的重要性ψP;
14通过公式(9)得到最终的目标节点表示向量Zv;
15return ZvV //返回目标节点向量。

表3

实验所需数据集"

数据集 节点类型 节点数量 边关系 边关系数量 元路径
DBLP A 4 057 AP
PT
PV
19 654
85 810
14 328
APA
APCPA
APTPA
P 14 328
T 7 723
V 20
IMDB M 4 278 MD
MA
4 278
12 828
MAM
MDM
D 2 081
A 5 257
ACM A 5 912 PA
PS
9 936
3 025
PAP
PSP
P 3 025
S 57

表4

节点分类实验结果"

Dataset Metrics DeepWalk metapath2vec GCN GAT HERec HAN MAFGNN
DBLP Macro-F1 84.81 91.89 92.38 91.73 92.34 93.80 95.81
Micro-F186.2692.8093.0992.5593.2793.9996.00
IMDB Macro-F1 50.35 45.15 51.81 52.99 47.64 55.54 56.76
Micro-F154.3348.8154.6156.9750.9957.6059.20
ACM Macro-F1 57.38 53.99 61.89 66.39 73.92 77.32 84.83
Micro-F157.5754.3161.5171.5773.8478.2385.77

表5

节点聚类实验结果"

Dataset Metrics DeepWalk metapath2vec GCN GAT HERec HAN MAFGNN
DBLP NMI 76.53 74.30 75.01 71.50 76.73 81.98 88.21
ARI81.3578.5080.4977.2680.9887.3792.41
IMDB NMI 1.45 1.20 5.45 8.45 5.45 20.94 25.78
ARI2.141.704.407.464.4023.7029.52
ACM NMI 41.61 21.22 40.44 56.26 40.70 59.17 68.61
ARI35.1021.0029.5953.6937.1359.4871.39

图5

节点嵌入实验结果"

图6

DBLP数据集训练轮数和Loss"

图7

DBLP数据集训练轮数和Micro-F1"

图8

注意力头数实验"

图9

语义级注意力向量维度实验图"

1 ATWOOD J , TOWSLEY D . Diffusion-convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2016, 29, 2001- 2009.
2 SHI Chuan , LI Yitong , ZHANG Jiawei , et al. A survey of heterogeneous information network analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 29 (1): 17- 37.
3 SUN Yizhou , HAN Jiawei . Mining heterogeneous information networks: a structural analysis approach[J]. Association for Computing Machinery, 2013, 14 (2): 20- 28.
4 CUI Peng , WANG Xiao , PEI Jian , et al. A survey on network embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31 (5): 833- 852.
5 CAO B, LIU N N, YANG Q. Transfer learning for collective link prediction in multiple heterogenous domains[C]//WROBEL S. Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: DAUMÉ Ⅲ H, 2010: 159-166.
6 MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07)[2023-09-23]. https://arxiv.org/abs/1301.3781.
7 PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C]//MACSKASSY S. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2014: 701-710.
8 DONG Y, CHAWLA N V, SWAMI A. Metapath2vec: scalable representation learning for heterogeneous networks[C]//MATWIN S. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: Association for Computing Machinery, 2017: 135-144.
9 SUN Yizhou , HAN Jiawei , YAN Xifeng , et al. Pathsim: meta path-based top-k similarity search in heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2011, 4 (11): 992- 1003.
doi: 10.14778/3402707.3402736
10 LEE S, PARK C, YU H. BHIN2vec: balancing the type of relation in heterogeneous information network[C]// ZHU Wenwu. Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery, 2019: 619-628.
11 WANG X, JI H, SHI C, et al. Heterogeneous graph attention network[C]// LING L. The World Wide Web Conference. New York: Association for Computing Machinery, 2019: 2022-2032.
12 FU Xinyu, ZHANG Jiani, MENG Ziqiao, et al. MAGNN: metapath aggregated graph neural network for heterogeneous graph embedding[C]//HUANG Y N. Proceedings of The Web Conference 2020. Taipei: Association for Computing Machinery, 2020: 2331-2341.
13 WANG Xiao , LU Yuanfu , SHI Chuan , et al. Dynamic heterogeneous information network embedding with meta-path based proximity[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34 (3): 1117- 1132.
14 XUE Hansheng, YANG Luwei, JIANG Wen, et al. Modeling dynamic heterogeneous network for link prediction using hierarchical attention with temporal RNN[C]//BIE T D. Machine Learning and Knowledge Discovery in Databases. Ghent, Belgium: Springer, 2020: 282-298.
15 MIKOLOV T , SUTSKEVER I , CHEN K , et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26, 3111- 3119.
16 TANG Jian, QU Meng, WANG Mingzhe, et al. LINE: large-scale information network embedding[C]//GANGEMI A. Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: Association for Computing Machinery, 2015: 1067-1077.
17 RIBEIRO L F R, SAVERESE P H P, FIGUEIREDO D R. Struc2vec: learning node representations from structural identity[C]//MATWIN S. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: Association for Computing Machinery, 2017: 385-394.
18 WU Zonghan , PAN Shirui , CHEN Fengwen , et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32 (1): 4- 24.
19 KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2023-09-23]. https://arxiv.org/abs/1609.02907.
20 VELI C ˇ KOVI C ' P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. (2018-02-04)[2023-09-23]. https://arxiv.org/abs/1710.10903.
21 VASWANI A , SHAZEER N , PARMAR N , et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30, 6000- 6010.
22 FU T, LEE W C, LEI Z. HIN2Vec: explore meta-paths in heterogeneous information networks for representation learning[C]//LIM E P. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Singapore: Association for Computing Machinery, 2017: 1797-1806.
23 SHANG Jingbo, QU Meng, LIU Jialu, et al. Meta-path guided embedding for similarity search in large-scale heterogeneous information networks[EB/OL]. (2016-10-31)[2023-09-23]. https://arxiv.org/abs/1610.09769.
24 GUAN Mengya, CAI Xinjun, SHANG Jiaxing, et al. HMSG: heterogeneous graph neural network based on metapath subgraph learning[EB/OL]. (2021-09-07)[2023-09-23]. https://arxiv.org/abs/2109.02868.
25 ZHOU Sheng, BU Jiajun, WANG Xin, et al. HAHE: hierarchical attentive heterogeneous information network embedding[EB/OL]. (2019-05-14)[2023-09-23]. https://arxiv.org/abs/1902.01475.
26 ZHANG Chuxu, SONG Dongjin, HUANG Chao, et al. Heterogeneous graph neural network[C]//TEREDESAI A. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage, USA: Association for Computing Machinery, 2019: 793-803.
27 BOLYA D, FU C Y, DAI X L, et al. Hydra attention: efficient attention with many heads[EB/OL]. (2023-02-12)[2023-09-23]. https://arxiv.org/abs/2209.07484.
[1] 刘梦迪,张贤勇,莫智文. 基于改进距离测度的概率犹豫模糊多属性群决策新方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 118-126.
[2] 那宇嘉,谢珺,杨海洋,续欣莹. 融合上下文的知识图谱补全方法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 71-80.
[3] 卢婵,郭军军,谭凯文,相艳,余正涛. 基于文本指导的层级自适应融合的多模态情感分析[J]. 《山东大学学报(理学版)》, 2023, 58(12): 31-40, 51.
[4] 史开泉,李守伟. 分离模糊集合(A(-overF),AF)与模糊信息智能融合[J]. 《山东大学学报(理学版)》, 2022, 57(7): 1-13.
[5] 王静红,梁丽娜,李昊康,王熙照. 基于标记注意力机制的社区发现算法[J]. 《山东大学学报(理学版)》, 2022, 57(12): 1-12.
[6] 陈淑珍,李守伟,史开泉. 信息融合-分离与隐性属性的显性特征[J]. 《山东大学学报(理学版)》, 2022, 57(11): 1-9.
[7] 鲍亮,陈志豪,陈文章,叶锴,廖祥文. 基于双重多路注意力匹配的观点型阅读理解[J]. 《山东大学学报(理学版)》, 2021, 56(3): 44-53.
[8] 唐光远,郭军军,余正涛,张亚飞,高盛祥. 基于BERT与法条知识驱动的法条推荐方法[J]. 《山东大学学报(理学版)》, 2021, 56(11): 24-30.
[9] 许侃,刘瑞鑫,林鸿飞,刘海峰,冯娇娇,李家平,林原,徐博. 基于异质网络嵌入的学术论文推荐方法[J]. 《山东大学学报(理学版)》, 2020, 55(11): 35-45.
[10] 郝长盈,兰艳艳,张海楠,郭嘉丰,徐君,庞亮,程学旗. 基于拓展关键词信息的对话生成模型[J]. 《山东大学学报(理学版)》, 2019, 54(7): 68-76.
[11] 张秀全,李小朝. P-信息融合与它的P-矩阵推理智能生成[J]. 山东大学学报(理学版), 2017, 52(4): 93-99.
[12] 张景晓, 徐凤生. P-集合的属性函数与P-信息融合的属性合取特征及应用[J]. 山东大学学报(理学版), 2015, 50(10): 19-26.
[13] 张凌1,汤积华1,史开泉2. 内P-信息融合与它的属性合取特征[J]. 山东大学学报(理学版), 2014, 49(2): 93-97.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 唐风琴1,白建明2. 一类带有广义负上限相依索赔额的风险过程大偏差[J]. J4, 2013, 48(1): 100 -106 .
[2] 王 瑶,刘 建,王仁卿,* . 阿利效应及其对生物入侵和自然保护中小种群管理的启示[J]. J4, 2007, 42(1): 76 -82 .
[3] 王刚,许信顺*. 一种新的基于多示例学习的场景分类方法[J]. J4, 2010, 45(7): 108 -113 .
[4] 陆玮洁,主沉浮,宋 翠,杨艳丽 . 中药郁金中无机离子的毛细管电泳法测定[J]. J4, 2007, 42(7): 13 -18 .
[5] 赵君1,赵晶2,樊廷俊1*,袁文鹏1,3,张铮1,丛日山1. 水溶性海星皂苷的分离纯化及其抗肿瘤活性研究[J]. J4, 2013, 48(1): 30 -35 .
[6] 杨永伟1,2,贺鹏飞2,李毅君2,3. BL-代数的严格滤子[J]. 山东大学学报(理学版), 2014, 49(03): 63 -67 .
[7] 韩亚飞,伊文慧,王文波,王延平,王华田*. 基于高通量测序技术的连作杨树人工林土壤细菌多样性研究[J]. 山东大学学报(理学版), 2014, 49(05): 1 -6 .
[8] 解树涛,宋晓妍,石梅,陈秀兰,孙彩云,张玉忠* . 康宁木霉(Trichoderma koningii)SMF2分泌的peptaibols类抗菌肽Trichokonins抑菌活性研究[J]. J4, 2006, 41(6): 140 -144 .
[9] 刘保仓,史开泉 . S-粗集的信度特征[J]. J4, 2006, 41(5): 26 -29 .
[10] 张丽,许玉铭 . σ1-空间及其性质[J]. J4, 2006, 41(5): 30 -32 .