基于元路径属性融合的异质网络表示学习

doi:10.6040/j.issn.1671-9352.7.2023.787

Abstract

Abstract:

Focusing on the research on representation learning of information networks, a metapath attribute fusion graph neural network (MAFGNN) based on metapath information fusion is proposed, which is to integrate the neighbor information of the target node, including the metapath information, into the node before introducing the metapath in the heterogeneous network to achieve the fusion of target node and neighbor information. This method first converts the attribute features of different types of nodes into dimensions to facilitate subsequent fusion operations. The fusion operation of target node information is completed by calculating the weight values of target nodes and neighbor nodes. Then target nodes are fused according to specific metapaths, and finally different semantic information is fused between different metapaths. Experiments on multiple heterogeneous information datasets show that the MAFGNN model has the best performance and more accurate prediction results than the most advanced benchmark experiments in dealing with heterogeneous network node embedding.

Key words: metapath, heterogeneous information network, heterogeneous graph embedding, information fusion, attention mechanism

CLC Number:

TP181

Jinghong WANG,Zhibing WU,Peng HUANG,Jiateng YANG,Bi LI. Heterogeneous network representation learning based on metapath attribute fusion[J].JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 1-13.

Figures/Tables 14

Fig.1

Table 1

Fig.2

Fig.3

Fig.4

Table 2

Table 3

Table 4

Table 5

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

References 27

1	ATWOOD J , TOWSLEY D . Diffusion-convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2016, 29, 2001- 2009.
2	SHI Chuan , LI Yitong , ZHANG Jiawei , et al. A survey of heterogeneous information network analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 29 (1): 17- 37.
3	SUN Yizhou , HAN Jiawei . Mining heterogeneous information networks: a structural analysis approach[J]. Association for Computing Machinery, 2013, 14 (2): 20- 28.
4	CUI Peng , WANG Xiao , PEI Jian , et al. A survey on network embedding[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 31 (5): 833- 852.
5	CAO B, LIU N N, YANG Q. Transfer learning for collective link prediction in multiple heterogenous domains[C]//WROBEL S. Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel: DAUMÉ Ⅲ H, 2010: 159-166.
6	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07)[2023-09-23]. https://arxiv.org/abs/1301.3781.
7	PEROZZI B, AL-RFOU R, SKIENA S. Deepwalk: online learning of social representations[C]//MACSKASSY S. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2014: 701-710.
8	DONG Y, CHAWLA N V, SWAMI A. Metapath2vec: scalable representation learning for heterogeneous networks[C]//MATWIN S. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: Association for Computing Machinery, 2017: 135-144.
9	SUN Yizhou , HAN Jiawei , YAN Xifeng , et al. Pathsim: meta path-based top-k similarity search in heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2011, 4 (11): 992- 1003. doi: 10.14778/3402707.3402736
10	LEE S, PARK C, YU H. BHIN2vec: balancing the type of relation in heterogeneous information network[C]// ZHU Wenwu. Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: Association for Computing Machinery, 2019: 619-628.
11	WANG X, JI H, SHI C, et al. Heterogeneous graph attention network[C]// LING L. The World Wide Web Conference. New York: Association for Computing Machinery, 2019: 2022-2032.
12	FU Xinyu, ZHANG Jiani, MENG Ziqiao, et al. MAGNN: metapath aggregated graph neural network for heterogeneous graph embedding[C]//HUANG Y N. Proceedings of The Web Conference 2020. Taipei: Association for Computing Machinery, 2020: 2331-2341.
13	WANG Xiao , LU Yuanfu , SHI Chuan , et al. Dynamic heterogeneous information network embedding with meta-path based proximity[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 34 (3): 1117- 1132.
14	XUE Hansheng, YANG Luwei, JIANG Wen, et al. Modeling dynamic heterogeneous network for link prediction using hierarchical attention with temporal RNN[C]//BIE T D. Machine Learning and Knowledge Discovery in Databases. Ghent, Belgium: Springer, 2020: 282-298.
15	MIKOLOV T , SUTSKEVER I , CHEN K , et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26, 3111- 3119.
16	TANG Jian, QU Meng, WANG Mingzhe, et al. LINE: large-scale information network embedding[C]//GANGEMI A. Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: Association for Computing Machinery, 2015: 1067-1077.
17	RIBEIRO L F R, SAVERESE P H P, FIGUEIREDO D R. Struc2vec: learning node representations from structural identity[C]//MATWIN S. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: Association for Computing Machinery, 2017: 385-394.
18	WU Zonghan , PAN Shirui , CHEN Fengwen , et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32 (1): 4- 24.
19	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2023-09-23]. https://arxiv.org/abs/1609.02907.
20	VELI C ˇ KOVI C ' P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. (2018-02-04)[2023-09-23]. https://arxiv.org/abs/1710.10903.
21	VASWANI A , SHAZEER N , PARMAR N , et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30, 6000- 6010.
22	FU T, LEE W C, LEI Z. HIN2Vec: explore meta-paths in heterogeneous information networks for representation learning[C]//LIM E P. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Singapore: Association for Computing Machinery, 2017: 1797-1806.
23	SHANG Jingbo, QU Meng, LIU Jialu, et al. Meta-path guided embedding for similarity search in large-scale heterogeneous information networks[EB/OL]. (2016-10-31)[2023-09-23]. https://arxiv.org/abs/1610.09769.
24	GUAN Mengya, CAI Xinjun, SHANG Jiaxing, et al. HMSG: heterogeneous graph neural network based on metapath subgraph learning[EB/OL]. (2021-09-07)[2023-09-23]. https://arxiv.org/abs/2109.02868.
25	ZHOU Sheng, BU Jiajun, WANG Xin, et al. HAHE: hierarchical attentive heterogeneous information network embedding[EB/OL]. (2019-05-14)[2023-09-23]. https://arxiv.org/abs/1902.01475.
26	ZHANG Chuxu, SONG Dongjin, HUANG Chao, et al. Heterogeneous graph neural network[C]//TEREDESAI A. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage, USA: Association for Computing Machinery, 2019: 793-803.
27	BOLYA D, FU C Y, DAI X L, et al. Hydra attention: efficient attention with many heads[EB/OL]. (2023-02-12)[2023-09-23]. https://arxiv.org/abs/2209.07484.

Related Articles 11

[1]	LIU Mengdi, ZHANG Xianyong, MO Zhiwen. A new probabilistic hesitant fuzzy multi-attribute group decision making method based on improved distance measures [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 118-126.
[2]	Yujia NA,Jun XIE,Haiyang YANG,Xinying XU. Context fusion-based knowledge graph completion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(9): 71-80.
[3]	Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.
[4]	SHI Kai-quan, LI Shou-wei. Separated fuzzy set (A^(-overF),A^F) and the intelligent fusion of fuzzy information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 1-13.
[5]	WANG Jing-hong, LIANG Li-na, LI Hao-kang, WANG Xi-zhao. Community discovery algorithm based on label attention mechanism [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 1-12.
[6]	BAO Liang, CHEN Zhi-hao, CHEN Wen-zhang, YE Kai, LIAO Xiang-wen. Dual co-matching network with multiway attention for opinion reading comprehension [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(3): 44-53.
[7]	TANG Guang-yuan, GUO Jun-jun, YU Zheng-tao, ZHANG Ya-fei,GAO Sheng-xiang. Method of recommendation based on knowledge driven by BERT and law [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(11): 24-30.
[8]	Kan XU,Rui-xin LIU,Hong-fei LIN,Hai-feng LIU,Jiao-jiao FENG,Jia-ping LI,Yuan LIN,Bo XU. Academic paper recommendation based on heterogeneous network embedding [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(11): 35-45.
[9]	Chang-ying HAO,Yan-yan LAN,Hai-nan ZHANG,Jia-feng GUO,Jun XU,Liang PANG,Xue-qi CHENG. Dialogue generation model based on extended keywords information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 68-76.
[10]	ZHANG Xiu-quan, LI Xiao-chao. P-information fusion and its P-matrix reasoning intelligent generation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(4): 93-99.
[11]	ZHANG Jing-xiao, XU Feng-sheng. The attribute functions of P-sets and attribute conjunctive character-application of P-information fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(10): 19-26.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 10

[1]	TANG Feng-qin1, BAI Jian-ming2. The precise large deviations for a risk model with extended negatively upper orthant dependent claim sizes[J]. J4, 2013, 48(1): 100 -106 .
[2]	QANG Yao,LIU Jian and WANG Ren-qing,* . Allee effect and its significance to small population management in nature reservation and biological invasions[J]. J4, 2007, 42(1): 76 -82 .
[3]	WANG Gang, XU Xin-shun*. A new Multi-instance learning method for scene classification[J]. J4, 2010, 45(7): 108 -113 .
[4]	LU Wei-jie,ZHU Chen-fu,SONG Cui and YANG Yan-li . Determination of inorganic cations in the Chinese traditional drug Yujin by capillary electrophoresis[J]. J4, 2007, 42(7): 13 -18 .
[5]	ZHAO Jun1, ZHAO Jing2, FAN Ting-jun1*, YUAN Wen-peng1,3, ZHANG Zheng1, CONG Ri-shan1. Purification and anti-tumor activity examination of water-soluble asterosaponin from Asterias rollestoni Bell[J]. J4, 2013, 48(1): 30 -35 .
[6]	YANG Yong-wei1, 2, HE Peng-fei2, LI Yi-jun2,3. On strict filters of BL-algebras#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 63 -67 .
[7]	HAN Ya-fei, YI Wen-hui, WANG Wen-bo, WANG Yan-ping, WANG Hua-tian*. Soil bacteria diversity in continuous cropping poplar plantation#br# by high throughput sequencing[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 1 -6 .
[8]	XIE Shu-tao,SONG Xiao-yanAntimicrobial activities of Trichokonins: Peptaibollike antimicrobial peptides produced by Trichoderma koningii[J]. J4, 2006, 41(6): 140 -144 .
[9]	LIU Bao-cang,SHI Kai-quan . Reliablity characteristics of Srough sets[J]. J4, 2006, 41(5): 26 -29 .
[10]	ZHANG Li,XU Yu-ming . [J]. J4, 2006, 41(5): 30 -32 .

符号	解释
G	G=(V, E)异质图信息网络
V、G	异质网络中的节点集、边集
P	元路径
G^P	基于元路径P的子图
h、h′	初始节点特征向量、转换后节点特征向量
p	基于元路径P的路径实例
W	特定节点类型转换矩阵
e_{i, j}^P	基于元路径P的节点对(i, j)重要性系数
h_node^P	元路径P的节点级注意力向量
Z_v	语义层级节点嵌入向量

算法1 基于元路径与属性融合的异质网络表示学习
输入：异质信息网络G={V, E}，节点特征h=(h₁, h₂, …, h_N)，元路径集合{P₁, P₂, …, P_n}，节点类型A={A₁, A₂, …, A_\|A\|};
输出：节点最终表示Z。
1	for v in V do //对所有目标节点进行操作。
2	通过公式(1)将目标节点和目标节点的邻居节点进行特征维度转换得到转换后的向量表示h′_v、h′_u;
3	通过公式(2)和(3)计算转换后的目标节点和邻居节点之间的权重系数并得到输出结果：$\boldsymbol{h}=a_{v, u} \boldsymbol{h}_v^{\prime}=a_{v, u}\left(\boldsymbol{h}_1, \boldsymbol{h}_2, \cdots, \boldsymbol{h}_N\right)$;
4	end for//终止循环。
5	for {P₁, P₂, …, P_n} in P do
6	for v in V do
7	通过公式(4)计算基于元路径P_i的邻居节点重要性Φ_{i, j}^P;
8	通过公式(5)目标节点基于元路径的节点嵌入h_node^P;
9	if注意力头数K>1 do
10	公式(7)计算嵌入方式
11	end for//终止循环
12	end for//终止循环
13	通过公式(8)得到每一条元路径的重要性ψ_P;
14	通过公式(9)得到最终的目标节点表示向量Z_v;
15	return Z_v∈V //返回目标节点向量。

数据集	节点类型	节点数量	边关系	边关系数量	元路径
DBLP	A	4 057	A—P P—T P—V	19 654 85 810 14 328	APA APCPA APTPA
	P	14 328
	T	7 723
	V	20
IMDB	M	4 278	M—D M—A	4 278 12 828	MAM MDM
	D	2 081
	A	5 257
ACM	A	5 912	P—A P—S	9 936 3 025	PAP PSP
	P	3 025
	S	57

Dataset	Metrics	DeepWalk	metapath2vec	GCN	GAT	HERec	HAN	MAFGNN
DBLP	Macro-F1	84.81	91.89	92.38	91.73	92.34	93.80	95.81
DBLP	Micro-F1	86.26	92.80	93.09	92.55	93.27	93.99	96.00
IMDB	Macro-F1	50.35	45.15	51.81	52.99	47.64	55.54	56.76
IMDB	Micro-F1	54.33	48.81	54.61	56.97	50.99	57.60	59.20
ACM	Macro-F1	57.38	53.99	61.89	66.39	73.92	77.32	84.83
ACM	Micro-F1	57.57	54.31	61.51	71.57	73.84	78.23	85.77

Dataset	Metrics	DeepWalk	metapath2vec	GCN	GAT	HERec	HAN	MAFGNN
DBLP	NMI	76.53	74.30	75.01	71.50	76.73	81.98	88.21
DBLP	ARI	81.35	78.50	80.49	77.26	80.98	87.37	92.41
IMDB	NMI	1.45	1.20	5.45	8.45	5.45	20.94	25.78
IMDB	ARI	2.14	1.70	4.40	7.46	4.40	23.70	29.52
ACM	NMI	41.61	21.22	40.44	56.26	40.70	59.17	68.61
ACM	ARI	35.10	21.00	29.59	53.69	37.13	59.48	71.39

Heterogeneous network representation learning based on metapath attribute fusion

RichHTML

PDF (PC)