JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (3): 1-13.doi: 10.6040/j.issn.1671-9352.7.2023.787

    Next Articles

Heterogeneous network representation learning based on metapath attribute fusion

Jinghong WANG1,2,3(),Zhibing WU1,Peng HUANG1,Jiateng YANG1,Bi LI4,*()   

  1. 1. School of Computer and Cyberspace Security, Hebei Normal University, Shijiazhuang 050024, Hebei, China
    2. Hebei Key Laboratory of Network and Information Security (Hebei Normal University), Shijiazhuang 050024, Hebei, China
    3. Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics and Security (Hebei Normal University), Shijiazhuang 050024, Hebei, China
    4. School of Business, Hebei Normal University, Shijiazhuang 050024, Hebei, China
  • Received:2023-04-29 Online:2024-03-20 Published:2024-03-06
  • Contact: Bi LI E-mail:wangjinghong@126.com;libilb@263.net

Abstract:

Focusing on the research on representation learning of information networks, a metapath attribute fusion graph neural network (MAFGNN) based on metapath information fusion is proposed, which is to integrate the neighbor information of the target node, including the metapath information, into the node before introducing the metapath in the heterogeneous network to achieve the fusion of target node and neighbor information. This method first converts the attribute features of different types of nodes into dimensions to facilitate subsequent fusion operations. The fusion operation of target node information is completed by calculating the weight values of target nodes and neighbor nodes. Then target nodes are fused according to specific metapaths, and finally different semantic information is fused between different metapaths. Experiments on multiple heterogeneous information datasets show that the MAFGNN model has the best performance and more accurate prediction results than the most advanced benchmark experiments in dealing with heterogeneous network node embedding.

Key words: metapath, heterogeneous information network, heterogeneous graph embedding, information fusion, attention mechanism

CLC Number: 

  • TP181

Fig.1

Heterogeneous information networks and metapaths"

Table 1

This article uses various symbols to summarize"

符号 解释
G G=(V, E)异质图信息网络
VG 异质网络中的节点集、边集
P 元路径
GP 基于元路径P的子图
hh′ 初始节点特征向量、转换后节点特征向量
p 基于元路径P的路径实例
W 特定节点类型转换矩阵
ei, jP 基于元路径P的节点对(i, j)重要性系数
hnodeP 元路径P的节点级注意力向量
Zv 语义层级节点嵌入向量

Fig.2

Schematic diagram of MAFGNN model"

Fig.3

Network structures based on different metapaths"

Fig.4

Neighbor nodes based on different metapath target nodes"

Table 2

MAFGNN algorithm process"

算法1 基于元路径与属性融合的异质网络表示学习
输入:异质信息网络G={V, E},节点特征h=(h1, h2, …, hN),元路径集合{P1, P2, …, Pn},节点类型A={A1, A2, …, A|A|};
输出:节点最终表示Z
1for v in V do //对所有目标节点进行操作。
2通过公式(1)将目标节点和目标节点的邻居节点进行特征维度转换得到转换后的向量表示h′vh′u;
3通过公式(2)和(3)计算转换后的目标节点和邻居节点之间的权重系数并得到输出结果:$\boldsymbol{h}=a_{v, u} \boldsymbol{h}_v^{\prime}=a_{v, u}\left(\boldsymbol{h}_1, \boldsymbol{h}_2, \cdots, \boldsymbol{h}_N\right)$;
4end for//终止循环。
5for {P1, P2, …, Pn} in P do
6  for v in V do
7    通过公式(4)计算基于元路径Pi的邻居节点重要性Φi, jP;
8    通过公式(5)目标节点基于元路径的节点嵌入hnodeP;
9    if注意力头数K>1 do
10      公式(7)计算嵌入方式
11  end for//终止循环
12end for//终止循环
13通过公式(8)得到每一条元路径的重要性ψP;
14通过公式(9)得到最终的目标节点表示向量Zv;
15return ZvV //返回目标节点向量。

Table 3

Experimental dataset required"

数据集 节点类型 节点数量 边关系 边关系数量 元路径
DBLP A 4 057 AP
PT
PV
19 654
85 810
14 328
APA
APCPA
APTPA
P 14 328
T 7 723
V 20
IMDB M 4 278 MD
MA
4 278
12 828
MAM
MDM
D 2 081
A 5 257
ACM A 5 912 PA
PS
9 936
3 025
PAP
PSP
P 3 025
S 57

Table 4

Node Classification Experiment Result"

Dataset Metrics DeepWalk metapath2vec GCN GAT HERec HAN MAFGNN
DBLP Macro-F1 84.81 91.89 92.38 91.73 92.34 93.80 95.81
Micro-F186.2692.8093.0992.5593.2793.9996.00
IMDB Macro-F1 50.35 45.15 51.81 52.99 47.64 55.54 56.76
Micro-F154.3348.8154.6156.9750.9957.6059.20
ACM Macro-F1 57.38 53.99 61.89 66.39 73.92 77.32 84.83
Micro-F157.5754.3161.5171.5773.8478.2385.77

Table 5

Node clustering experiment results"

Dataset Metrics DeepWalk metapath2vec GCN GAT HERec HAN MAFGNN
DBLP NMI 76.53 74.30 75.01 71.50 76.73 81.98 88.21
ARI81.3578.5080.4977.2680.9887.3792.41
IMDB NMI 1.45 1.20 5.45 8.45 5.45 20.94 25.78
ARI2.141.704.407.464.4023.7029.52
ACM NMI 41.61 21.22 40.44 56.26 40.70 59.17 68.61
ARI35.1021.0029.5953.6937.1359.4871.39

Fig.5

Node embedding experiment results"

Fig.6

Number of training rounds and Loss for DBLP dataset"

Fig.7

DBLP dataset training rounds and Micro-F1"

Fig.8

Attention headcount experiment"

Fig.9

Semantic level attention vector dimension experiment"

1 ATWOOD J , TOWSLEY D . Diffusion-convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2016, 29, 2001- 2009.
2 SHI Chuan , LI Yitong , ZHANG Jiawei , et al. A survey of heterogeneous information network analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 29 (1): 17- 37.
3 SUN Yizhou , HAN Jiawei . Mining heterogeneous information networks: a structural analysis approach[J]. Association for Computing Machinery, 2013, 14 (2): 20- 28.
4 CUI Peng , WANG Xiao , PEI Jian , et al. A survey on network embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31 (5): 833- 852.
5 CAO B, LIU N N, YANG Q. Transfer learning for collective link prediction in multiple heterogenous domains[C]//WROBEL S. Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: DAUMÉ Ⅲ H, 2010: 159-166.
6 MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07)[2023-09-23]. https://arxiv.org/abs/1301.3781.
7 PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C]//MACSKASSY S. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2014: 701-710.
8 DONG Y, CHAWLA N V, SWAMI A. Metapath2vec: scalable representation learning for heterogeneous networks[C]//MATWIN S. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: Association for Computing Machinery, 2017: 135-144.
9 SUN Yizhou , HAN Jiawei , YAN Xifeng , et al. Pathsim: meta path-based top-k similarity search in heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2011, 4 (11): 992- 1003.
doi: 10.14778/3402707.3402736
10 LEE S, PARK C, YU H. BHIN2vec: balancing the type of relation in heterogeneous information network[C]// ZHU Wenwu. Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery, 2019: 619-628.
11 WANG X, JI H, SHI C, et al. Heterogeneous graph attention network[C]// LING L. The World Wide Web Conference. New York: Association for Computing Machinery, 2019: 2022-2032.
12 FU Xinyu, ZHANG Jiani, MENG Ziqiao, et al. MAGNN: metapath aggregated graph neural network for heterogeneous graph embedding[C]//HUANG Y N. Proceedings of The Web Conference 2020. Taipei: Association for Computing Machinery, 2020: 2331-2341.
13 WANG Xiao , LU Yuanfu , SHI Chuan , et al. Dynamic heterogeneous information network embedding with meta-path based proximity[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34 (3): 1117- 1132.
14 XUE Hansheng, YANG Luwei, JIANG Wen, et al. Modeling dynamic heterogeneous network for link prediction using hierarchical attention with temporal RNN[C]//BIE T D. Machine Learning and Knowledge Discovery in Databases. Ghent, Belgium: Springer, 2020: 282-298.
15 MIKOLOV T , SUTSKEVER I , CHEN K , et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26, 3111- 3119.
16 TANG Jian, QU Meng, WANG Mingzhe, et al. LINE: large-scale information network embedding[C]//GANGEMI A. Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: Association for Computing Machinery, 2015: 1067-1077.
17 RIBEIRO L F R, SAVERESE P H P, FIGUEIREDO D R. Struc2vec: learning node representations from structural identity[C]//MATWIN S. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: Association for Computing Machinery, 2017: 385-394.
18 WU Zonghan , PAN Shirui , CHEN Fengwen , et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32 (1): 4- 24.
19 KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2023-09-23]. https://arxiv.org/abs/1609.02907.
20 VELI C ˇ KOVI C ' P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. (2018-02-04)[2023-09-23]. https://arxiv.org/abs/1710.10903.
21 VASWANI A , SHAZEER N , PARMAR N , et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30, 6000- 6010.
22 FU T, LEE W C, LEI Z. HIN2Vec: explore meta-paths in heterogeneous information networks for representation learning[C]//LIM E P. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Singapore: Association for Computing Machinery, 2017: 1797-1806.
23 SHANG Jingbo, QU Meng, LIU Jialu, et al. Meta-path guided embedding for similarity search in large-scale heterogeneous information networks[EB/OL]. (2016-10-31)[2023-09-23]. https://arxiv.org/abs/1610.09769.
24 GUAN Mengya, CAI Xinjun, SHANG Jiaxing, et al. HMSG: heterogeneous graph neural network based on metapath subgraph learning[EB/OL]. (2021-09-07)[2023-09-23]. https://arxiv.org/abs/2109.02868.
25 ZHOU Sheng, BU Jiajun, WANG Xin, et al. HAHE: hierarchical attentive heterogeneous information network embedding[EB/OL]. (2019-05-14)[2023-09-23]. https://arxiv.org/abs/1902.01475.
26 ZHANG Chuxu, SONG Dongjin, HUANG Chao, et al. Heterogeneous graph neural network[C]//TEREDESAI A. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage, USA: Association for Computing Machinery, 2019: 793-803.
27 BOLYA D, FU C Y, DAI X L, et al. Hydra attention: efficient attention with many heads[EB/OL]. (2023-02-12)[2023-09-23]. https://arxiv.org/abs/2209.07484.
[1] LIU Mengdi, ZHANG Xianyong, MO Zhiwen. A new probabilistic hesitant fuzzy multi-attribute group decision making method based on improved distance measures [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 118-126.
[2] Yujia NA,Jun XIE,Haiyang YANG,Xinying XU. Context fusion-based knowledge graph completion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(9): 71-80.
[3] Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.
[4] SHI Kai-quan, LI Shou-wei. Separated fuzzy set (A(-overF),AF) and the intelligent fusion of fuzzy information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 1-13.
[5] WANG Jing-hong, LIANG Li-na, LI Hao-kang, WANG Xi-zhao. Community discovery algorithm based on label attention mechanism [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 1-12.
[6] BAO Liang, CHEN Zhi-hao, CHEN Wen-zhang, YE Kai, LIAO Xiang-wen. Dual co-matching network with multiway attention for opinion reading comprehension [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(3): 44-53.
[7] TANG Guang-yuan, GUO Jun-jun, YU Zheng-tao, ZHANG Ya-fei,GAO Sheng-xiang. Method of recommendation based on knowledge driven by BERT and law [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(11): 24-30.
[8] Kan XU,Rui-xin LIU,Hong-fei LIN,Hai-feng LIU,Jiao-jiao FENG,Jia-ping LI,Yuan LIN,Bo XU. Academic paper recommendation based on heterogeneous network embedding [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(11): 35-45.
[9] Chang-ying HAO,Yan-yan LAN,Hai-nan ZHANG,Jia-feng GUO,Jun XU,Liang PANG,Xue-qi CHENG. Dialogue generation model based on extended keywords information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 68-76.
[10] ZHANG Xiu-quan, LI Xiao-chao. P-information fusion and its P-matrix reasoning intelligent generation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(4): 93-99.
[11] ZHANG Jing-xiao, XU Feng-sheng. The attribute functions of P-sets and attribute conjunctive character-application of P-information fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(10): 19-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] TANG Feng-qin1, BAI Jian-ming2. The precise large deviations for a risk model with extended negatively upper orthant dependent claim  sizes[J]. J4, 2013, 48(1): 100 -106 .
[2] QANG Yao,LIU Jian and WANG Ren-qing,* . Allee effect and its significance to small population management in nature reservation and biological invasions[J]. J4, 2007, 42(1): 76 -82 .
[3] WANG Gang, XU Xin-shun*. A new Multi-instance learning method for scene classification[J]. J4, 2010, 45(7): 108 -113 .
[4] LU Wei-jie,ZHU Chen-fu,SONG Cui and YANG Yan-li . Determination of inorganic cations in the Chinese traditional drug Yujin by capillary electrophoresis[J]. J4, 2007, 42(7): 13 -18 .
[5] ZHAO Jun1, ZHAO Jing2, FAN Ting-jun1*, YUAN Wen-peng1,3, ZHANG Zheng1, CONG Ri-shan1. Purification and anti-tumor activity examination of water-soluble asterosaponin from Asterias rollestoni Bell[J]. J4, 2013, 48(1): 30 -35 .
[6] YANG Yong-wei1, 2, HE Peng-fei2, LI Yi-jun2,3. On strict filters of BL-algebras#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 63 -67 .
[7] HAN Ya-fei, YI Wen-hui, WANG Wen-bo, WANG Yan-ping, WANG Hua-tian*. Soil bacteria diversity in continuous cropping poplar plantation#br# by high throughput sequencing[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 1 -6 .
[8] XIE Shu-tao,SONG Xiao-yanAntimicrobial activities of Trichokonins: Peptaibollike antimicrobial peptides produced by Trichoderma koningii[J]. J4, 2006, 41(6): 140 -144 .
[9] LIU Bao-cang,SHI Kai-quan . Reliablity characteristics of Srough sets[J]. J4, 2006, 41(5): 26 -29 .
[10] ZHANG Li,XU Yu-ming . [J]. J4, 2006, 41(5): 30 -32 .