JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (7): 44-52, 104.doi: 10.6040/j.issn.1671-9352.1.2023.042

• Review • Previous Articles     Next Articles

Matrix product operator based sequential recommendation model

Peiyu LIU1(),Bowen YAO2,Zefeng GAO1,2,*(),Wayne Xin ZHAO1,*()   

  1. 1. Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China
    2. Department of Physics, Renmin University of China, Beijing 100872, China
  • Received:2023-11-24 Online:2024-07-20 Published:2024-07-15
  • Contact: Zefeng GAO,Wayne Xin ZHAO E-mail:liupeiyustu@ruc.edu.cn;zfgao@ruc.edu.cn;batmanfly@qq.com

Abstract:

The task of sequential recommendation confronts challenges characterized by high complexity and substantial diversity. The paradigm of pre-training and fine-tuning is extensively employed for learning item representations based on sequential data in recommendation scenarios. However, prevalent approaches tend to disregard the potential underfitting and overfitting issues that may arise during model fine-tuning in new domains. To address this concern, a novel neural network architecture grounded in the framework of matrix product operator (MPO) is introduced, and two versatile fine-tuning strategies are presented. Firstly, a lightweight fine-tuning approach that involves updating only a subset of parameters is proposed to effectively mitigate the problem of overfitting during the fine-tuning process. Secondly, an over-parameterization fine-tuning strategy is introduced by augmenting the number of trainable parameters, robustly addressing the issue of underfitting during fine-tuning. Through extensive experimentation on well-established open-source datasets, the efficacy of the proposed approach is demonstrated by achieving performance achievements. This serves as a compelling testament to the effectiveness of the proposed approach in addressing the challenge of general item representation in recommendation systems.

Key words: recommendation model, sequential data, matrix product operator, overfitting, underfitting

CLC Number: 

  • TP391

Fig.1

Lightweight & over-parameter fine-tuning"

Table 1

Details of datasets for evaluation"

数据集 用户个数 产品个数 交互个数 Avg. n Avg. c
Scientific 8 842 4 385 52 427 7.04 182.87
Pantry 13 101 4 898 126 962 9.69 83.17
Instruments 24 962 9 964 208 926 8.37 165.18
Arts 45 486 21 019 395 150 8.69 155.57
Office 87 436 25 986 684 837 7.84 193.22

Table 2

Comparison with other baseline models"

Dataset Metric S3Rec BERT4Rec CCDR UniSRec MPORec MPORecLight
Scientific hit@10 0.052 5 0.048 8 0.069 5 0.109 5 0.110 3 0.111 6
hit@50 0.141 8 0.118 5 0.164 7 0.211 9 0.205 6 0.222 2
ndcg@10 0.027 5 0.024 3 0.034 0 0.059 8 0.059 6 0.059 9
ndcg@50 0.046 8 0.039 3 0.054 6 0.083 5 0.083 5 0.083 7
Pantry hit@10 0.044 4 0.030 8 0.048 0 0.062 7 0.066 4 0.060 5
hit@50 0.131 5 0.103 0 0.126 2 0.171 1 0.179 0 0.170 1
ndcg@10 0.021 4 0.015 2 0.020 3 0.030 8 0.032 4 0.030 5
ndcg@50 0.040 0 0.030 5 0.038 5 0.054 2 0.056 8 0.054 1
Instruments hit@10 0.105 6 0.081 3 0.084 8 0.112 4 0.116 4 0.107 8
hit@50 0.192 7 0.145 4 0.175 3 0.208 6 0.220 0 0.196 8
ndcg@10 0.071 3 0.062 0 0.045 1 0.065 8 0.067 6 0.062 9
ndcg@50 0.090 1 0.075 6 0.064 7 0.086 7 0.090 1 0.082 3
Arts hit@10 0.110 3 0.072 2 0.067 1 0.101 8 0.101 9 0.093 4
hit@50 0.188 8 0.136 7 0.147 8 0.199 3 0.199 8 0.186 1
ndcg@10 0.060 1 0.047 9 0.034 8 0.057 3 0.057 5 0.051 9
ndcg@50 0.079 3 0.061 9 0.052 3 0.078 4 0.078 9 0.072 0
Office hit@10 0.103 0 0.082 5 0.054 9 0.094 7 0.095 8 0.082 8
hit@50 0.161 3 0.122 7 0.109 5 0.164 7 0.168 4 0.144 2
ndcg@10 0.065 3 0.063 4 0.029 0 0.056 0 0.056 1 0.049 6
ndcg@50 0.078 0 0.072 1 0.040 9 0.071 3 0.071 4 0.062 9

Table 3

Comparison of different fine-tuning strategies"

Dataset Metric UniSRec_F MPORec MPORecLight MPORec +ex2 MPORec +ex4 MPORec +ex6 Improvement/%
Scientific hit@10 0.118 8 0.125 2 0.112 1 0.124 3 0.122 7 0.122 0 5.39
hit@50 0.239 4 0.240 0 0.221 2 0.236 0 0.237 6 0.237 9 0.25
ndcg@10 0.064 1 0.065 4 0.060 9 0.065 3 0.065 0 0.065 2 2.03
ndcg@50 0.090 3 0.090 2 0.084 8 0.089 7 0.090 0 0.090 4 0.11
Pantry hit@10 0.063 6 0.067 3 0.061 9 0.066 6 0.067 9 0.069 2 8.81
hit@50 0.165 8 0.180 1 0.169 8 0.179 4 0.178 6 0.180 9 9.11
ndcg@10 0.030 6 0.032 0 0.029 7 0.031 7 0.032 4 0.032 7 6.86
ndcg@50 0.052 7 0.056 4 0.053 1 0.056 1 0.056 2 0.056 9 7.97
Instruments hit@10 0.118 9 0.121 1 0.109 2 0.116 1 0.118 8 0.120 0 1.85
hit@50 0.225 5 0.225 6 0.203 8 0.220 1 0.224 2 0.226 0 0.22
ndcg@10 0.068 0 0.069 0 0.064 1 0.067 3 0.068 0 0.068 8 1.47
ndcg@50 0.091 2 0.091 7 0.084 6 0.089 8 0.090 9 0.091 8 0.66
Arts hit@10 0.106 6 0.108 3 0.092 2 0.107 4 0.105 0 0.103 8 1.59
hit@50 0.204 9 0.212 2 0.183 3 0.209 7 0.206 4 0.204 3 3.56
ndcg@10 0.058 6 0.059 4 0.050 2 0.059 2 0.057 6 0.057 1 1.37
ndcg@50 0.079 9 0.082 1 0.070 1 0.081 5 0.079 7 0.079 0 2.75
Office hit@10 0.101 3 0.102 9 0.088 0 0.100 9 0.101 0 0.100 7 1.58
hit@50 0.170 2 0.171 0 0.150 6 0.169 1 0.168 8 0.168 2 0.47
ndcg@10 0.061 9 0.063 2 0.054 0 0.062 5 0.062 1 0.617 0 2.10
ndcg@50 0.076 9 0.078 1 0.0676 0.077 3 0.076 9 0.076 5 1.56

Table 4

Impact of different decomposition lengthn"

长度n 3 5 7 9
MPORec 0.124 3 0.125 2 0.120 2 0.118 2

Table 5

Impact of different learning rate"

学习率 1e-4 2e-4 4e-4 6e-4 8e-4
MPORec 0.119 7 0.119 8 0.120 2 0.122 8 0.123 0

Table 6

Analysis of the training efficiency"

模型 总参数量/M 训练参数量/M 显存/GB 训练时间/s
UniSRec 6.3 6.3
UniSRecF 6.3 1.9 7.74 1 498
MPORecLight 6.5 0.3 7.76 703
MPORec 6.5 2.1 7.78 980
+Expand2 6.6 2.2 7.81 1 448
+Expand4 6.7 2.3 7.84 1567
+Expand6 6.8 2.4 7.88 3 509
1 LI Jing, REN Pengjie, CHEN Zhumin, et al. Neural attentive session-based recommendation[EB/OL]. (2017-11-13)[2023-08-09]. http://arxiv.org/abs/1711.04725.
2 HOU Yupeng, HU Binbin, ZHANG Zhiqiang, et al. CORE: simple and effective session-based recommendation within consistent representation space[C]//SIGIR'22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: Association for Computing Machinery, 2022: 1796-1801.
3 HOU Y, MU S, ZHAO W X, et al. Towards universal sequence representation learning for recommender systems[EB/OL]. (2022-06-13)[2023-04-13]. http://arxiv.org/abs/2206.05941.
4 XU Ruixin, LUO Fuli, ZHANG Zhiyuan, et al. Raise a child in large language model: towards effective and generalizable fine-tuning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: Association for Computational Linguistics, 2021: 9514-9528.
5 HIDASI B, KARATZOGLOU A, BALTRUNAS L, et al. Session-based recommendations with recurrent neural networks[C/OL]//4th International Conference on Learning Representations (ICLR 2016). 2016: 1-10. http://arxiv.org/pdf/1511.06939.
6 ZHOU K H, YU H, ZHAO W X, et al. Filter-enhanced MLP is all you need for sequential recommendation[C]//WWW̓22: The ACM Web Conference 2022. Lyon: ACM, 2022: 2388-2399.
7 CHANG Jianxin, GAO Chen, ZHENG Yu, et al. Sequential recommendation with graph neural networks[EB/OL]. (2023-07-26)[2023-08-09]. http://arxiv.org/abs/2106.14226.
8 YUAN F, HE X, KARATZOGLOU A, et al. Parameter-efficient transfer from sequential behaviors for user modeling and recommendation[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020. New York: ACM, 2020: 1469-1478.
9 GAO Zefeng, ZHOU Kun, LIU Peiyu, et al. Small pre-trained language models can be fine-tuned as large models via over-parameterization[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics ACL 2023. Toronto: Association for Computational Linguistics, 2023: 3819-3834.
10 GAO Zefeng , CHENG Song , HE Rongqiang , et al. Compressing deep neural networks by matrix product operators[J]. Physical Review Research, 2020, 2 (2): 023300.
11 GAO Zefeng, SUN Xingwei, GAO Lan, et al. Compressing LSTM networks by matrix product operators[EB/OL]. (2022-03-31)[2023-05-06]. https://arxiv.org/abs/2012.11943.
12 NOVIKOV A, PODOPRIKHIN D, OSOKIN A, et al. Tensorizing neural networks[EB/OL]. (2015-09-22)[2023-05-06]. https://arxiv.org/abs/1509.06569.
13 GARIPOV T, PODOPRIKHIN D, NOVIKOV A, et al. Ultimate tensorization: compressing convolutional and FC layers alike[EB/OL]. (2016-11-10)[2023-05-06]. https://arxiv.org/abs/1611.03214.
14 LIU P, GAO Z F, ZHAO W X, et al. Enabling lightweight fine-tuning for pre-trained language model compression based on matrix product operators[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for lomputational Linguistics, 2021: 5388-5398.
15 GAO Z F, LIU P, ZHAO W X, et al. Parameter-efficient mixture-of-experts architecture for pre-trained language models[C]//Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju: International Committee on Computational Linguistics, 2022: 3263-3273.
16 LIU Peiyu, GAO Zefeng, CHEN Yushuo, et al. Scaling pre-trained language models to deeper via parameter-efficient architecture[EB/OL]. (2023-04-10)[2023-05-06]. http://arxiv.org/abs/2303.16753.
17 SUN Xingwei , GAO Zefeng , LU Zhengyi , et al. A model compression method with matrix product operators for speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28, 2837- 2847.
18 EDWARD J H, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. (2021-10-16)[2022-06-16]. http://arxiv.org/abs/2106.09685.
19 NI J, LI J, MCAULEY J J. Justifying recommendations using distantly-labeled reviews and fine-grained aspects[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: Association for Computational Linguistics, 2019: 188-197.
20 ZHOU K, WANG H, ZHAO W X, et al. S3Rec: self-supervised learning for sequential recommendation with mutual information maximization[C/OL]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management. [2023-08-08]. https://doi.org/10.1145/3340531.3411954.
21 SUN Fei, LIU Jun, WU Jian, et al. BERT4Rec: sequential recommendation with bidirectional encoder representations from transformer[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: ACM, 2019: 1441-1450.
22 WEN Keyu, TAN Zhenshan, CHENG Qingrong, et al. Contrastive cross-modal knowledge sharing pre-training for vision-language representation learning and retrieval[EB/OL]. (2022-07-08)[2023-10-18]. http://arxiv.org/abs/2207.00733.
[1] GU Feng,LIU Chen-xi,WU Yangyang . Chinese Web page feature selection method based on Sequential data mining [J]. J4, 2006, 41(3): 95-99 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Qi,ZHAO Xiu-heng,LI Guo-jun . Embedding hypergraph in trees of rings[J]. J4, 2007, 42(10): 114 -117 .
[2] LIU Jian-ya and ZHAN Tao . The quadratic Waring-Goldbach problem[J]. J4, 2007, 42(2): 1 -18 .
[3] LI Min1,2, LI Qi-qiang1. Observer-based sliding mode control of uncertain singular time-delay systems#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 37 -42 .
[4] MA Yuan-yuan, MENG Hui-li, XU Jiu-cheng, ZHU Ma. Normal distribution of lattice close-degree based on granular computing[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(08): 107 -110 .
[5] FENG Xiao and ZHANG Shun-hua . τ-Flat test modules and τ-flatcover[J]. J4, 2007, 42(1): 31 -34 .
[6] QIU Tao-rong, WANG Lu, XIONG Shu-jie, BAI Xiao-ming. A granular computing approach for knowledge hiding[J]. J4, 2010, 45(7): 60 -64 .
[7] XUE Qiu-fang1,2, GAO Xing-bao1*, LIU Xiao-guang1. Several equivalent conditions for H-matrix based on the extrapolated GaussSeidel iterative method[J]. J4, 2013, 48(4): 65 -71 .
[8] LIU Ji-qin, . Unionrepresentation theorem of bothbranch fuzzy set[J]. J4, 2006, 41(2): 7 -13 .
[9] TIAN Xue-gang, WANG Shao-ying. Solutions to the operator equation AXB=C[J]. J4, 2010, 45(6): 74 -80 .
[10] PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling. A multi-level clustering approach based on noun phrases for search results[J]. J4, 2010, 45(7): 39 -44 .