您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (7): 44-52, 104.doi: 10.6040/j.issn.1671-9352.1.2023.042

• 综述 • 上一篇    下一篇

基于矩阵乘积算符表示的序列化推荐模型

刘沛羽1(),姚博文2,高泽峰1,2,*(),赵鑫1,*()   

  1. 1. 中国人民大学高瓴人工智能学院, 北京 100872
    2. 中国人民大学物理系, 北京 100872
  • 收稿日期:2023-11-24 出版日期:2024-07-20 发布日期:2024-07-15
  • 通讯作者: 高泽峰,赵鑫 E-mail:liupeiyustu@ruc.edu.cn;zfgao@ruc.edu.cn;batmanfly@qq.com
  • 作者简介:刘沛羽(1992—),男,博士研究生,研究方向为自然语言处理和模型压缩. E-mail:liupeiyustu@ruc.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62206299);国家自然科学基金资助项目(62222215)

Matrix product operator based sequential recommendation model

Peiyu LIU1(),Bowen YAO2,Zefeng GAO1,2,*(),Wayne Xin ZHAO1,*()   

  1. 1. Gaoling School of Artificial Intelligence, Renmin University of China, Beijing 100872, China
    2. Department of Physics, Renmin University of China, Beijing 100872, China
  • Received:2023-11-24 Online:2024-07-20 Published:2024-07-15
  • Contact: Zefeng GAO,Wayne Xin ZHAO E-mail:liupeiyustu@ruc.edu.cn;zfgao@ruc.edu.cn;batmanfly@qq.com

摘要:

推荐系统中的序列化推荐任务面临着高度复杂和多样性大的挑战, 基于序列化数据的商品表示学习中广泛采用预训练和微调的方法,现有方法通常忽略了在新领域中模型微调可能会遇到的欠拟合和过拟合问题。为了应对这一问题,构建一种基于矩阵乘积算符(matrix product operator, MPO)表示的神经网络结构,并实现2种灵活的微调策略。首先,通过仅更新部分参数的轻量化微调策略,有效地缓解微调过程中的过拟合问题;其次,通过增加可微调参数的过参数化微调策略,有力地应对微调中的欠拟合问题。经过实验验证,该方法在现有开源数据集上均实现显著的性能提升,充分展示在实现通用的物品表示问题上的有效性。

关键词: 推荐模型, 序列化数据, 矩阵乘积算符, 过拟合, 欠拟合

Abstract:

The task of sequential recommendation confronts challenges characterized by high complexity and substantial diversity. The paradigm of pre-training and fine-tuning is extensively employed for learning item representations based on sequential data in recommendation scenarios. However, prevalent approaches tend to disregard the potential underfitting and overfitting issues that may arise during model fine-tuning in new domains. To address this concern, a novel neural network architecture grounded in the framework of matrix product operator (MPO) is introduced, and two versatile fine-tuning strategies are presented. Firstly, a lightweight fine-tuning approach that involves updating only a subset of parameters is proposed to effectively mitigate the problem of overfitting during the fine-tuning process. Secondly, an over-parameterization fine-tuning strategy is introduced by augmenting the number of trainable parameters, robustly addressing the issue of underfitting during fine-tuning. Through extensive experimentation on well-established open-source datasets, the efficacy of the proposed approach is demonstrated by achieving performance achievements. This serves as a compelling testament to the effectiveness of the proposed approach in addressing the challenge of general item representation in recommendation systems.

Key words: recommendation model, sequential data, matrix product operator, overfitting, underfitting

中图分类号: 

  • TP391

图1

轻量化微调和过参数化微调"

表1

数据集评测信息"

数据集 用户个数 产品个数 交互个数 Avg. n Avg. c
Scientific 8 842 4 385 52 427 7.04 182.87
Pantry 13 101 4 898 126 962 9.69 83.17
Instruments 24 962 9 964 208 926 8.37 165.18
Arts 45 486 21 019 395 150 8.69 155.57
Office 87 436 25 986 684 837 7.84 193.22

表2

对比不同基线模型的评测结果"

Dataset Metric S3Rec BERT4Rec CCDR UniSRec MPORec MPORecLight
Scientific hit@10 0.052 5 0.048 8 0.069 5 0.109 5 0.110 3 0.111 6
hit@50 0.141 8 0.118 5 0.164 7 0.211 9 0.205 6 0.222 2
ndcg@10 0.027 5 0.024 3 0.034 0 0.059 8 0.059 6 0.059 9
ndcg@50 0.046 8 0.039 3 0.054 6 0.083 5 0.083 5 0.083 7
Pantry hit@10 0.044 4 0.030 8 0.048 0 0.062 7 0.066 4 0.060 5
hit@50 0.131 5 0.103 0 0.126 2 0.171 1 0.179 0 0.170 1
ndcg@10 0.021 4 0.015 2 0.020 3 0.030 8 0.032 4 0.030 5
ndcg@50 0.040 0 0.030 5 0.038 5 0.054 2 0.056 8 0.054 1
Instruments hit@10 0.105 6 0.081 3 0.084 8 0.112 4 0.116 4 0.107 8
hit@50 0.192 7 0.145 4 0.175 3 0.208 6 0.220 0 0.196 8
ndcg@10 0.071 3 0.062 0 0.045 1 0.065 8 0.067 6 0.062 9
ndcg@50 0.090 1 0.075 6 0.064 7 0.086 7 0.090 1 0.082 3
Arts hit@10 0.110 3 0.072 2 0.067 1 0.101 8 0.101 9 0.093 4
hit@50 0.188 8 0.136 7 0.147 8 0.199 3 0.199 8 0.186 1
ndcg@10 0.060 1 0.047 9 0.034 8 0.057 3 0.057 5 0.051 9
ndcg@50 0.079 3 0.061 9 0.052 3 0.078 4 0.078 9 0.072 0
Office hit@10 0.103 0 0.082 5 0.054 9 0.094 7 0.095 8 0.082 8
hit@50 0.161 3 0.122 7 0.109 5 0.164 7 0.168 4 0.144 2
ndcg@10 0.065 3 0.063 4 0.029 0 0.056 0 0.056 1 0.049 6
ndcg@50 0.078 0 0.072 1 0.040 9 0.071 3 0.071 4 0.062 9

表3

对比不同微调策略的结果"

Dataset Metric UniSRec_F MPORec MPORecLight MPORec +ex2 MPORec +ex4 MPORec +ex6 Improvement/%
Scientific hit@10 0.118 8 0.125 2 0.112 1 0.124 3 0.122 7 0.122 0 5.39
hit@50 0.239 4 0.240 0 0.221 2 0.236 0 0.237 6 0.237 9 0.25
ndcg@10 0.064 1 0.065 4 0.060 9 0.065 3 0.065 0 0.065 2 2.03
ndcg@50 0.090 3 0.090 2 0.084 8 0.089 7 0.090 0 0.090 4 0.11
Pantry hit@10 0.063 6 0.067 3 0.061 9 0.066 6 0.067 9 0.069 2 8.81
hit@50 0.165 8 0.180 1 0.169 8 0.179 4 0.178 6 0.180 9 9.11
ndcg@10 0.030 6 0.032 0 0.029 7 0.031 7 0.032 4 0.032 7 6.86
ndcg@50 0.052 7 0.056 4 0.053 1 0.056 1 0.056 2 0.056 9 7.97
Instruments hit@10 0.118 9 0.121 1 0.109 2 0.116 1 0.118 8 0.120 0 1.85
hit@50 0.225 5 0.225 6 0.203 8 0.220 1 0.224 2 0.226 0 0.22
ndcg@10 0.068 0 0.069 0 0.064 1 0.067 3 0.068 0 0.068 8 1.47
ndcg@50 0.091 2 0.091 7 0.084 6 0.089 8 0.090 9 0.091 8 0.66
Arts hit@10 0.106 6 0.108 3 0.092 2 0.107 4 0.105 0 0.103 8 1.59
hit@50 0.204 9 0.212 2 0.183 3 0.209 7 0.206 4 0.204 3 3.56
ndcg@10 0.058 6 0.059 4 0.050 2 0.059 2 0.057 6 0.057 1 1.37
ndcg@50 0.079 9 0.082 1 0.070 1 0.081 5 0.079 7 0.079 0 2.75
Office hit@10 0.101 3 0.102 9 0.088 0 0.100 9 0.101 0 0.100 7 1.58
hit@50 0.170 2 0.171 0 0.150 6 0.169 1 0.168 8 0.168 2 0.47
ndcg@10 0.061 9 0.063 2 0.054 0 0.062 5 0.062 1 0.617 0 2.10
ndcg@50 0.076 9 0.078 1 0.0676 0.077 3 0.076 9 0.076 5 1.56

表4

分解长度参数n影响分析"

长度n 3 5 7 9
MPORec 0.124 3 0.125 2 0.120 2 0.118 2

表5

学习率影响分析"

学习率 1e-4 2e-4 4e-4 6e-4 8e-4
MPORec 0.119 7 0.119 8 0.120 2 0.122 8 0.123 0

表6

训练效率对比分析"

模型 总参数量/M 训练参数量/M 显存/GB 训练时间/s
UniSRec 6.3 6.3
UniSRecF 6.3 1.9 7.74 1 498
MPORecLight 6.5 0.3 7.76 703
MPORec 6.5 2.1 7.78 980
+Expand2 6.6 2.2 7.81 1 448
+Expand4 6.7 2.3 7.84 1567
+Expand6 6.8 2.4 7.88 3 509
1 LI Jing, REN Pengjie, CHEN Zhumin, et al. Neural attentive session-based recommendation[EB/OL]. (2017-11-13)[2023-08-09]. http://arxiv.org/abs/1711.04725.
2 HOU Yupeng, HU Binbin, ZHANG Zhiqiang, et al. CORE: simple and effective session-based recommendation within consistent representation space[C]//SIGIR'22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: Association for Computing Machinery, 2022: 1796-1801.
3 HOU Y, MU S, ZHAO W X, et al. Towards universal sequence representation learning for recommender systems[EB/OL]. (2022-06-13)[2023-04-13]. http://arxiv.org/abs/2206.05941.
4 XU Ruixin, LUO Fuli, ZHANG Zhiyuan, et al. Raise a child in large language model: towards effective and generalizable fine-tuning[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: Association for Computational Linguistics, 2021: 9514-9528.
5 HIDASI B, KARATZOGLOU A, BALTRUNAS L, et al. Session-based recommendations with recurrent neural networks[C/OL]//4th International Conference on Learning Representations (ICLR 2016). 2016: 1-10. http://arxiv.org/pdf/1511.06939.
6 ZHOU K H, YU H, ZHAO W X, et al. Filter-enhanced MLP is all you need for sequential recommendation[C]//WWW̓22: The ACM Web Conference 2022. Lyon: ACM, 2022: 2388-2399.
7 CHANG Jianxin, GAO Chen, ZHENG Yu, et al. Sequential recommendation with graph neural networks[EB/OL]. (2023-07-26)[2023-08-09]. http://arxiv.org/abs/2106.14226.
8 YUAN F, HE X, KARATZOGLOU A, et al. Parameter-efficient transfer from sequential behaviors for user modeling and recommendation[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020. New York: ACM, 2020: 1469-1478.
9 GAO Zefeng, ZHOU Kun, LIU Peiyu, et al. Small pre-trained language models can be fine-tuned as large models via over-parameterization[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics ACL 2023. Toronto: Association for Computational Linguistics, 2023: 3819-3834.
10 GAO Zefeng , CHENG Song , HE Rongqiang , et al. Compressing deep neural networks by matrix product operators[J]. Physical Review Research, 2020, 2 (2): 023300.
11 GAO Zefeng, SUN Xingwei, GAO Lan, et al. Compressing LSTM networks by matrix product operators[EB/OL]. (2022-03-31)[2023-05-06]. https://arxiv.org/abs/2012.11943.
12 NOVIKOV A, PODOPRIKHIN D, OSOKIN A, et al. Tensorizing neural networks[EB/OL]. (2015-09-22)[2023-05-06]. https://arxiv.org/abs/1509.06569.
13 GARIPOV T, PODOPRIKHIN D, NOVIKOV A, et al. Ultimate tensorization: compressing convolutional and FC layers alike[EB/OL]. (2016-11-10)[2023-05-06]. https://arxiv.org/abs/1611.03214.
14 LIU P, GAO Z F, ZHAO W X, et al. Enabling lightweight fine-tuning for pre-trained language model compression based on matrix product operators[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for lomputational Linguistics, 2021: 5388-5398.
15 GAO Z F, LIU P, ZHAO W X, et al. Parameter-efficient mixture-of-experts architecture for pre-trained language models[C]//Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju: International Committee on Computational Linguistics, 2022: 3263-3273.
16 LIU Peiyu, GAO Zefeng, CHEN Yushuo, et al. Scaling pre-trained language models to deeper via parameter-efficient architecture[EB/OL]. (2023-04-10)[2023-05-06]. http://arxiv.org/abs/2303.16753.
17 SUN Xingwei , GAO Zefeng , LU Zhengyi , et al. A model compression method with matrix product operators for speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28, 2837- 2847.
18 EDWARD J H, SHEN Y, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. (2021-10-16)[2022-06-16]. http://arxiv.org/abs/2106.09685.
19 NI J, LI J, MCAULEY J J. Justifying recommendations using distantly-labeled reviews and fine-grained aspects[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: Association for Computational Linguistics, 2019: 188-197.
20 ZHOU K, WANG H, ZHAO W X, et al. S3Rec: self-supervised learning for sequential recommendation with mutual information maximization[C/OL]//Proceedings of the 29th ACM International Conference on Information & Knowledge Management. [2023-08-08]. https://doi.org/10.1145/3340531.3411954.
21 SUN Fei, LIU Jun, WU Jian, et al. BERT4Rec: sequential recommendation with bidirectional encoder representations from transformer[C]//Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: ACM, 2019: 1441-1450.
22 WEN Keyu, TAN Zhenshan, CHENG Qingrong, et al. Contrastive cross-modal knowledge sharing pre-training for vision-language representation learning and retrieval[EB/OL]. (2022-07-08)[2023-10-18]. http://arxiv.org/abs/2207.00733.
[1] 邵伟,朱高宇,于雷,郭嘉丰. 高维数据的降维与检索算法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 27-43.
[2] 杨纪元,马沐阳,任鹏杰,陈竹敏,任昭春,辛鑫,蔡飞,马军. 基于自监督的预训练在推荐系统中的研究[J]. 《山东大学学报(理学版)》, 2024, 59(7): 1-26.
[3] 陈海粟,廖佳纯,姚思诚. 政府开放数据中个人信息披露识别与统计方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 95-106.
[4] 温欣,李德玉. 基于属性加权的ML-KNN方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 107-117.
[5] 曾雪强,孙雨,刘烨,万中英,左家莉,王明文. 基于情感分布的emoji嵌入式表示[J]. 《山东大学学报(理学版)》, 2024, 59(3): 81-94.
[6] 牛泽群,李晓戈,强成宇,韩伟,姚怡,刘洋. 基于图注意力神经网络的实体消歧方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 71-80, 94.
[7] 史春雨,毛煜,刘浩阳,林耀进. 基于样本相关性的层次特征选择算法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 61-70.
[8] 卢婵,郭军军,谭凯文,相艳,余正涛. 基于文本指导的层级自适应融合的多模态情感分析[J]. 《山东大学学报(理学版)》, 2023, 58(12): 31-40, 51.
[9] 王新生,朱小飞,李程鸿. 标签指导的多尺度图神经网络蛋白质作用关系预测方法[J]. 《山东大学学报(理学版)》, 2023, 58(12): 22-30.
[10] 张乃洲,曹薇. 一种基于文本语义扩展的记忆网络查询建议模型[J]. 《山东大学学报(理学版)》, 2023, 58(12): 10-21.
[11] 陈淑珍,史开泉,李守伟. 微信息的嵌入生成及其智能隐藏-还原[J]. 《山东大学学报(理学版)》, 2023, 58(12): 1-9.
[12] 仲诚诚,周恒,张梓童,张春雷. LAC-UNet: 基于胶囊表达局部-整体特征关系的语义分割模型[J]. 《山东大学学报(理学版)》, 2023, 58(11): 116-126.
[13] 吴贤君,唐绍诗,王明秋. 融合基础属性和通信行为的移动用户个性化推荐[J]. 《山东大学学报(理学版)》, 2023, 58(9): 81-93.
[14] 那宇嘉,谢珺,杨海洋,续欣莹. 融合上下文的知识图谱补全方法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 71-80.
[15] 李程,车文刚,高盛祥. 一种用于航拍图像的目标检测算法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 59-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王 琦,赵秀恒,李国君 . 超图在树环中的嵌入问题[J]. J4, 2007, 42(10): 114 -117 .
[2] 刘建亚,展 涛 . 二次Waring-Goldbach问题[J]. J4, 2007, 42(2): 1 -18 .
[3] 李敏1,2,李歧强1. 不确定奇异时滞系统的观测器型滑模控制器[J]. 山东大学学报(理学版), 2014, 49(03): 37 -42 .
[4] 马媛媛, 孟慧丽, 徐久成, 朱玛. 基于粒计算的正态粒集下的格贴近度[J]. 山东大学学报(理学版), 2014, 49(08): 107 -110 .
[5] 丰 晓,张顺华 . τ-平坦试验模与τ-平坦覆盖[J]. J4, 2007, 42(1): 31 -34 .
[6] 邱桃荣,王璐,熊树洁,白小明. 一种基于粒计算的知识隐藏方法[J]. J4, 2010, 45(7): 60 -64 .
[7] 薛秋芳1,2,高兴宝1*,刘晓光1. H-矩阵基于外推GaussSeidel迭代法的几个等价条件[J]. J4, 2013, 48(4): 65 -71 .
[8] 刘纪芹, . 双枝模糊集并-表现定理[J]. J4, 2006, 41(2): 7 -13 .
[9] 田学刚, 王少英. 算子方程AXB=C的解[J]. J4, 2010, 45(6): 74 -80 .
[10] 庞观松,张黎莎,蒋盛益*,邝丽敏,吴美玲. 一种基于名词短语的检索结果多层聚类方法[J]. J4, 2010, 45(7): 39 -44 .