您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2023, Vol. 58 ›› Issue (12): 22-30.doi: 10.6040/j.issn.1671-9352.1.2022.8766

•   • 上一篇    下一篇

标签指导的多尺度图神经网络蛋白质作用关系预测方法

王新生(),朱小飞*(),李程鸿   

  1. 重庆理工大学计算机科学与工程学院, 重庆 400054
  • 收稿日期:2022-09-29 出版日期:2023-12-20 发布日期:2023-12-19
  • 通讯作者: 朱小飞 E-mail:wxscc0610@2020.cqut.edu.cn;zxf@cqut.edu.cn
  • 作者简介:王新生(1997—), 男, 硕士研究生, 研究方向为图神经网络和自然语言处理. E-mail: wxscc0610@2020.cqut.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62141201);重庆市技术创新与应用发展专项资助项目(cstc2020jscx-dxwtBX0014);重庆市教委语言文字科研项目重点项目(yyk20103);重庆理工大学研究生创新项目(gzlcx20223227)

Label guided multi-scale graph neural network for protein-protein interactions prediction

Xinsheng WANG(),Xiaofei ZHU*(),Chenghong LI   

  1. School of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Received:2022-09-29 Online:2023-12-20 Published:2023-12-19
  • Contact: Xiaofei ZHU E-mail:wxscc0610@2020.cqut.edu.cn;zxf@cqut.edu.cn

摘要:

提出了一种标签指导的多尺度图神经网络蛋白质作用关系(label guided multi-scale graph neural network protein-protein interactions, LGMG-PPI)预测方法, 不仅增强了数据的表征能力, 还引入了标签信息指导学习。首先, 通过图数据增强得到多尺度图表示, 并将多尺度图表示输入图神经网络得到多尺度蛋白质表示, 再引入对比学习进一步提高蛋白质表征能力; 其次, 构造自学习的标签关系图, 学习标签之间的关系, 得到标签的特征表示; 最后, 通过标签的特征表示, 对蛋白质作用关系的预测进行指导。在3个公开的数据集上进行了实验, 与最优基准方法相比, LGMG-PPI方法具有更好的性能, 相比最优基准方法, 在SHS27k、SHS148k和STRING这3个数据集上的micro-F1分数分别提升了2.01%、0.94%和0.93%。

关键词: 蛋白质作用关系, 图神经网络, 数据增强, 标签关系图

Abstract:

A protein-protein interactions prediction method based on label guided multi-scale graph neural network is proposed, which not only enhances the representation ability of data, but also introduces label information to guide learning. Firstly, the multi-scale graph representation is obtained by graph data augmentation, and the multi-scale graph representation is input into graph neural network to obtain multi-scale protein representation, and comparative learning is introduced to further improve the protein characterization ability. Secondly, the self-learning label relation graph is constructed to learn the relationship between labels and obtain the feature representation of labels. Finally, the prediction of protein-protein interactions is guided by the feature representation of labels. Experiments are carried out on three public datasets. Compared with the optimal benchmark method, the proposed method has better performance. Specifically, compared with the best baseline method, the micro-F1 scores on the three datasets SHS27k, SHS148k and STRING increase by 2.01%, 0.94% and 0.93% respectively.

Key words: protein-protein interactions, graph neural network, graph data augmentation, graph relation graph

中图分类号: 

  • TP391

图1

LGMG-PPI模型框架图"

表1

数据集统计"

数据集 节点数 连边数 氨基酸数(Avg) 标签数
SHS27k 1 690 7 624 571 7
SHS148k 5 189 44 488 597 7
STRING 15 335 593 397 604 7

表2

不同模型在不同数据集上的micro-F1"

方法 SHS27k SHS148k STRING
SVM 75.35±1.05 80.55±0.23
RF 78.45±0.88 82.10±0.20 88.91±0.08
LR 71.55±0.93 67.00±0.07 67.74±0.16
DPPI 73.99±5.04 77.48±1.39 94.85±0.13
DNN-PPI 77.89±4.97 88.49±0.48 83.08±0.11
PIPR 83.31±0.75 90.05±2.59 94.43±0.10
GNN-PPI 87.91±0.39 92.26±0.10 95.43±0.10
LGMG-PPI 89.68±0.10 93.13±0.03 96.32±0.04

表3

消融实验"

方法 SHS27k SHS148k STRING
LGMG-PPI 89.68±0.10 93.13±0.03 96.32±0.04
w/o MS-GDA($\mathscr{T}_{1}$) 89.35±0.05 93.05±0.11 96.14±0.17
w/o MS-GDA($\mathscr{T}_{2}$) 89.23±0.10 92.95±0.04 96.28±0.17
w/o MS-GDA 88.97±0.09 92.80±0.14 95.92±0.03
w/o SL-LRG 89.39±0.12 92.87±0.04 96.04±0.02

图2

SL-LRG拓扑结构有效性验证"

图3

SL-LRG节点特征有效性验证"

1 STELZL U , WORM U , LALOWSKI M , et al.A human protein-protein interaction network: a resource for annotating the proteome[J].Cell,2005,122(6):957-968.
doi: 10.1016/j.cell.2005.08.029
2 PETTA I , LIEVENS S , LIBERT C , et al.Modulation of protein-protein interactions for the development of novel therapeutics[J].Molecular Therapy,2016,24(4):707-718.
doi: 10.1038/mt.2015.214
3 SKRABANEK L , SAINI H K , BADER G D , et al.Computational prediction of protein-protein interactions[J].Molecular Biotechnology,2008,38(1):1-17.
doi: 10.1007/s12033-007-0069-2
4 FIELDS S , STERNGLANZ R .The two-hybrid system: an assay for protein-protein interactions[J].Trends in Genetics,1994,10(8):286-292.
doi: 10.1016/0168-9525(90)90012-U
5 TONG A H Y , DREES B , NARDELLI G , et al.A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules[J].Science,2002,295(5553):321-324.
doi: 10.1126/science.1064987
6 HO Y , GRUHLER A , HEILBUT A , et al.Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry[J].Nature,2002,415(6868):180-183.
doi: 10.1038/415180a
7 RAO V S , SRINIVAS K , SUJINI G N , et al.Protein-protein interaction detection: methods and analysis[J].International Journal of Proteomics,2014,2014(1):147648-147659.
8 HUANG H , BADER J S .Precision and recall estimates for two-hybrid screens[J].Bioinformatics,2009,25(3):372-378.
doi: 10.1093/bioinformatics/btn640
9 KRIZHEVSKY A , SUTSKEVER I , HINTON G E .Imagenet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
doi: 10.1145/3065386
10 HOCHREITER S , SCHMIDHUBER J .Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735
11 CHEN M , JU C J T , ZHOU G , et al.Multifaceted protein-protein interaction prediction based on Siamese residual RCNN[J].Bioinformatics,2019,35(14):i305-i314.
doi: 10.1093/bioinformatics/btz328
12 LV G F, HU Z Q, BI Y G et al. Learning unknown from correlations: graph neural network for inter-novel-protein interaction prediction[C]//Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. San Francisco: Margan Kaufmann, 2021: 3677-3683.
13 BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[C]//International Conference on Learning Representations. New Orleans: OpenReview. net, 2014.
14 DEFFERRARD M , BRESSON X , VANDERGHEYNST P .Convolutional neural networks on graphs with fast localized spectral filtering[J].Advances in Neural Information Processing Systems,2016,29(12):3844-3852.
15 VELICKOVIC P , CUCURULL G , CASANOVA A , et al.Graph attention networks[J].Stat,2017,1050(20):10.
16 XU K, HU W H, LESKOVEC J, et al. How powerful are graph neural networks?[C]//International Conference on Learning Representations. New Orleans: OpenReview. net, 2019.
17 YOU Y , CHEN T , SUI Y , et al.Graph contrastive learning with augmentations[J].Advances in Neural Information Processing Systems,2020,33,5812-5823.
18 CHEN Z M, WEI X S, WANG P, et al. Multi-label image recognition with graph convolutional networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5177-5186.
19 DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//NAACL. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186.
20 SZKLARCZYK D , GABLE A L , LYON D , et al.STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets[J].Nucleic Acids Research,2019,47(D1):D607-D613.
doi: 10.1093/nar/gky1131
21 GUO Y , YU L , WEN Z , et al.Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences[J].Nucleic Acids Research,2008,36(9):3025-3030.
doi: 10.1093/nar/gkn159
22 SILBERBERG Y , KUPIEC M , SHARAN R .A method for predicting protein-protein interaction types[J].PLoS One,2014,9(3):e90904.
23 WONG L, YOU Z H, LI S, et al. Detection of protein-protein interactions from amino acid sequences using a rotation forest model with a novel PR-LPQ descriptor[C]//International Conference on Intelligent Computing. Fuzhou: Springer, 2015: 713-720.
24 HASHEMIFAR S , NEYSHABUR B , KHAN A A , et al.Predicting protein-protein interactions through sequence-based deep learning[J].Bioinformatics,2018,34(17):i802-i810.
25 LI H , GONG X J , YU H , et al.Deep neural network based predictions of protein interactions using primary sequences[J].Molecules,2018,23(8):1923.
[1] 张斌艳,朱小飞,肖朝晖,黄贤英,吴洁. 基于半监督图神经网络的短文本分类[J]. 《山东大学学报(理学版)》, 2021, 56(5): 57-65.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 杨军. 金属基纳米材料表征和纳米结构调控[J]. 山东大学学报(理学版), 2013, 48(1): 1 -22 .
[2] 何海伦, 陈秀兰*. 变性剂和缓冲系统对适冷蛋白酶MCP-01和中温蛋白酶BP-01构象影响的圆二色光谱分析何海伦, 陈秀兰*[J]. 山东大学学报(理学版), 2013, 48(1): 23 -29 .
[3] 赵君1,赵晶2,樊廷俊1*,袁文鹏1,3,张铮1,丛日山1. 水溶性海星皂苷的分离纯化及其抗肿瘤活性研究[J]. J4, 2013, 48(1): 30 -35 .
[4] 孙小婷1,靳岚2*. DOSY在寡糖混合物分析中的应用[J]. J4, 2013, 48(1): 43 -45 .
[5] 罗斯特,卢丽倩,崔若飞,周伟伟,李增勇*. Monte-Carlo仿真酒精特征波长光子在皮肤中的传输规律及光纤探头设计[J]. J4, 2013, 48(1): 46 -50 .
[6] 杨伦,徐正刚,王慧*,陈其美,陈伟,胡艳霞,石元,祝洪磊,曾勇庆*. RNA干扰沉默PID1基因在C2C12细胞中表达的研究[J]. J4, 2013, 48(1): 36 -42 .
[7] 冒爱琴1, 2, 杨明君2, 3, 俞海云2, 张品1, 潘仁明1*. 五氟乙烷灭火剂高温热解机理研究[J]. J4, 2013, 48(1): 51 -55 .
[8] 杨莹,江龙*,索新丽. 容度空间上保费泛函的Choquet积分表示及相关性质[J]. J4, 2013, 48(1): 78 -82 .
[9] 李永明1, 丁立旺2. PA误差下半参数回归模型估计的r-阶矩相合[J]. J4, 2013, 48(1): 83 -88 .
[10] 董伟伟. 一种具有独立子系统的决策单元DEA排序新方法[J]. J4, 2013, 48(1): 89 -92 .