面向排序学习的概率分布优化模型

doi:10.6040/j.issn.1671-9352.1.2023.026

Abstract

Abstract:

Existing learning to rank (LTR) models rely on the scores output by models to represent the partial order among documents. Considering the limitation of treating scores as deterministic values, this paper proposes a probability distribution optimization method for the LTR model, which introduces the uncertainty of the ranking score. It smooths the scores in the form of probability distributions, thereby transforming the comparison of ranking scores into the probability estimation of score partial orders. The proposed method is applied to LTR models such as RankNet, LambdaRank, and LambdaMART. It effectively bridges the gap between the modeled probability and the target probability, leading to optimization of the LTR models. The paper conducts experiments on multiple large-scale real datasets, and the experimental results show that the optimized models outperform the original ones, which validates the effectiveness of the proposed method.

Key words: information retrieval, learning to rank, probability distribution

CLC Number:

TP391

Fengxu ZHAO,Jian WANG,Yuan LIN,Hongfei LIN. Probability distribution optimization model for learning to rank[J].JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 95-104.

Figures/Tables 7

Fig.1

Fig.2

Table 1

Fig.3

Fig.4

Table 2

Table 3

References 27

1	CHAPELLEO,WUM R.Gradient descent optimization of smoothed information retrieval metrics[J].Information Retrieval,2010,13(3):216-235.
2	QINTao,LIUTieyan,LIHang.A general approximation framework for direct optimization of information retrieval measures[J].Information Retrieval,2010,13(4):375-397.
3	TAYLOR M, GUIVER J, ROBERTSON S, et al. SoftRank: optimizing non-smooth rank metrics[C]//Proceedings of the 2008 International Conference on Web Search and Data Minin. Palo Alto: ACM, 2008: 77-86.
4	VALIZADEGAN H, JIN R, ZHANG R F, et al. Learning to rank by optimizing NDCG measure[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems. Vancouve: ACM, 2009: 1883-1891.
5	BRUCH S. An alternative cross entropy loss for learning-to-rank[C]//Proceedings of the Web Conference 2021. Ljubljana: ACM, 2021: 118-126.
6	BRUCH S, WANG X H, BENDERSKY M, et al. An analysis of the softmax cross entropy loss for learning-to-rank with binary relevance[C]//Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. Santa Clara: ACM, 2019: 75-78.
7	BURGES C, SHAKED T, RENSHAW E, et al. Learning to rank using gradient descent[C]//Proceedings of the 22nd International Conference on Machine Learning. Bonn: ACM, 2005: 89-96.
8	BURGES C J C, RAGNO R, LE Q V. Learning to rank with nonsmooth cost functions[C]//Proceedings of the 19th International Conference on Neural Information Processing Systems. Berlin: Springer, 2006: 193-200.
9	BURGESC J C.From RankNet to LambdaRank to LambdaMART: an overview[J].Learning,2010,11(81):23-581.
10	SUN Zhengya, QIN Tao, TAO Qing, et al. Robust sparse rank learning for non-smooth ranking measures[C]//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston: ACM, 2009: 259-266.
11	CHAPELLE O, CHANG Y. Yahoo! learning to rank challenge overview[C]//Proceedings of the 2010 International Conference on Yahoo! Learning to Rank Challenge. Haifa: ACM, 2010: 1-24.
12	NALLAPATI R. Discriminative models for information retrieval[C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield: ACM, 2004: 64-71.
13	LI P, BURGES C, WU Q. McRank: learning to rank using multiple classification and gradient boosting[C]//Proceedings of Advances in Neural Information Processing Systems 20 (NIPS 2007). Columbia: Curran Associates, Inc., 2007: 845-852.
14	JOACHIMS T. Optimizing search engines using clickthrough data[C]//Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton: ACM, 2002: 133-142.
15	BURGES C, SHAKED T, RENSHAW E, et al. Learning to rank using gradient descent[C]//Proceedings of the 22nd International Conference on Machine Learning. Bonn: ACM, 2005: 89-96.
16	FREUNDY,IYERR,SCHAPIRER E,et al.An efficient boosting algorithm for combining preferences[J].Journal of Machine Learning Research,2003,4(6):933-969.
17	XU Jun, LI Hang. AdaRank: a boosting algorithm for information retrieval[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam: ACM, 2007: 391-398.
18	CAO Zhe, QIN Tao, LIU Tieyan, et al. Learning to rank: from pairwise approach to listwise approach[C]//Proceedings of the 24th International Conference on Machine Learning. Corvalis: ACM, 2007: 129-136.
19	QINT,ZHANGX D,TSAIM F,et al.Query-level loss functions for information retrieval[J].Information Processing & Management,2008,44(2):838-855.
20	AI Q Y, WANG X H, BRUCH S, et al. Learning groupwise multivariate scoring functions using deep neural networks[C]//Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. Santa Clara: ACM, 2019: 85-92.
21	QIN Zhen, YAN Le, ZHUANG Honglei, et al. Are neural rankers still outperformed by gradient boosted decision trees?[C/OL]//International Conference on Learning Representations. 2021: 1-16. https://openreview.net/pdf?id=Ut1vF_q_vC.
22	AI Qingyao Y, BI Keping, GUO Jiafeng, et al. Learning a deep listwise context model for ranking refinement[C]//SIGIR'18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. Ann Arbor: ACM, 2018: 135-144.
23	BRUCH S, ZOGHI M, BENDERSKY M, et al. Revisiting approximate metric optimization in the age of deep neural networks[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Paris: ACM, 2019: 124-1244.
24	PANG Liang, XU Jun, AI Qingyao, et al. Setrank: learning a permutation-invariant ranking model for information retrieval[C]//SIGIR'20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 499-508.
25	QIN Tao, LIU Tieyan. Introducing LETOR 4.0 datasets[EB/OL]. (2013-01-09)[2023-10-18]. http://arxiv.org/abs/1306.2597.
26	DATOD,LUCCHESEC,NARDINIF M,et al.Fast ranking with additive ensembles of oblivious and non-oblivious regression trees[J].ACM Transactions on Information Systems (TOIS),2016,35(2):1-31.
27	WANG X H, LI C, GOLBANDI N, et al. The LambdaLoss framework for ranking metric optimization[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. Torino: ACM, 2018: 1313-1322.

Related Articles 15

[1]	ZHAO Ya-qi, REN Fang-guo. Representation of matrices and its applications to entropy [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(11): 97-104.
[2]	YANG Guang-bo, WANG Cai-shi, LUO Yan, WANG Yan-yan, NAN Xue-qi. 2-Tessellable staggered quantum walk on finite even cycles [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(1): 52-59.
[3]	Jia-qi WANG,Mu-yun YANG,Tie-jun ZHAO,Zhen-yu ZHAO. Construction of retrieval dataset of procuratorate legal documents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(7): 81-87.
[4]	WANG Kai, HONG Yu, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong. Study on boundary detection of users query intents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 13-18.
[5]	CAO Rong, HUANG Jin-zhu, YI Mian-zhu. Information retrieval: the final direction of human language technology research in DARPA [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 11-17.
[6]	MENG Ye, ZHANG Peng, SONG Da-wei. Study on collection statistics for parameter selection in pseudo relevance feedback [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(7): 18-22.
[7]	LI Sheng-dong, LÜ Xue-qiang, SUN Jun, SHI Shui-cai. Improvement of Lucene full-text indexing efficiency [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 76-79.
[8]	XU Jie-ping1, YIN Hong-yu1, FAN Zi-wen2. Study on cover songs identification based on phrase content [J]. J4, 2013, 48(7): 68-71.
[9]	SUN Jing-yu, CHEN Jun-jie, YU Xue-li, LI Xian-hua. A survey of collaborative Web search [J]. J4, 2011, 46(5): 9-15.
[10]	PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling. A multi-level clustering approach based on noun phrases for search results [J]. J4, 2010, 45(7): 39-44.
[11]	WAN Hai-ping,HE Hua-can . Dimensionality reduction based on spectral graph and its application [J]. J4, 2006, 41(3): 58-60 .
[12]	WANG Tai-feng,Yuan Ping-bo,JIA Ji-min,Yu Meng-hai . Portrait retrieval based on news environment [J]. J4, 2006, 41(3): 5-10 .
[13]	FU Xue-feng,LIU Qiu-yun,WANG Ming-wen . Rough sets information retrieval model based on multual information [J]. J4, 2006, 41(3): 116-119 .
[14]	SONG Chun-fang,SHI Bing . An algorithm to cluster the search results basedon the association rules [J]. J4, 2006, 41(3): 61-65 .
[15]	HE Jing . An approach to generate boolean query in question andanswering retrieval system [J]. J4, 2006, 41(3): 13-17 .

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 10

[1]	JIN Li-ming,YANG Yan*,LIU Wan-shun,HAN Bao-qin,TIAN Wen-jie,FAN Sheng-di . Protective effects of chitosan oligosaccharide and its derivatives on carbon tetrachloride-induced liver damage in mice[J]. J4, 2007, 42(7): 1 -04 .
[2]	ZHANG Dong-qing, YIN Xiao-bin, GAO Han-peng. Quasi-linearly Armendariz modules[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 1 -6 .
[3]	LI Ya-nan1, LIU Lei-po2, WANG Yu-guang3. Passive sliding mode control for uncertain time-delay systems subjected to input nonlinearity[J]. J4, 2010, 45(6): 99 -104 .
[4]	ZHANG Sumei, MA Qiaoling, ZHAO Haixia. (d,1)Total labeling of the product of path and cycle graph[J]. J4, 2009, 44(4): 37 -42 .
[5]	SU Qi,XIANG Kun and SUN Bin . The Shark-Search algorithm based on clustering links[J]. J4, 2006, 41(3): 1 -04 .
[6]	QU Xiao-ying ,ZHAO Jing . Solution of the Klein-Gordon equation for the time-dependent potential[J]. J4, 2007, 42(7): 22 -26 .
[7]	WANG Guang-chen . LQ nonzero sum stochastic differential game under partial observable information[J]. J4, 2007, 42(6): 12 -15 .
[8]	DENG Yong,DING Long-yun . Two-side pseudo-Euclidean rings and the normal form of matrices on them[J]. J4, 2007, 42(9): 114 -118 .
[9]	ZHANG Liang,WANG Hai-mei,HUANG He-yan,ZHANG Xiao-fei . Chinese question answering systemoriented Chinese parsing[J]. J4, 2006, 41(3): 30 -33 .
[10]	YANG Jun. Characterization and structural control of metalbased nanomaterials[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 1 -22 .

数据集	特征数	查询数			文档数
数据集	特征数	训练集	验证集	测试集	训练集	验证集	测试集
Web30K	136	18 919	6 306	6 306	2 270 296	747 218	753 611
Yahoo	700	19 944	2 994	6 983	473 134	71 083	165 660
Istella	220	20 901	2 318	9 799	6 587 822	737 803	3 129 004

模型	Web30K			Yahoo			Istella
模型	N@1	N@5	N@10	N@1	N@5	N@10	N@1	N@5	N@10
RankSVM	0.301 0	0.335 0	0.365 0	0.637 0	0.674 0	0.726 0	0.526 9	0.504 1	0.552 9
RankBoost	0.278 8	0.304 3	0.333 9	0.629 3	0.666 1	0.715 9	0.445 7	0.409 7	0.451 1
LambdaMART	0.453 5	0.445 9	0.464 6	0.685 2	0.702 7	0.745 8	0.657 1	0.611 8	0.659 1
GSF	0.412 9	0.415 1	0.437 4	0.642 9	0.683 8	0.731 6	0.622 4	0.596 8	0.650 8
DLCM	0.463 0	0.450 0	0.469 0	0.677 0	0.699 0	0.743 0	0.655 8	0.619 4	0.668 0
SetRank	0.429 0	0.422 0	0.442 8	0.671 1	0.696 0	0.739 8	0.673 3	0.627 8	0.673 7
SetRank-re	0.459 1	0.451 5	0.469 6	0.682 2	0.702 9	0.745 3	0.676 0	0.634 5	0.683 4
RankNet-PDO	0.406 9	0.411 7	0.435 2	0.623 3	0.658 5	0.704 7	0.617 7	0.591 0	0.650 6
LambdaRank-PDO	0.420 6	0.418 0	0.440 9	0.674 5	0.708 5^?	0.757 1	0.661 8	0.632 3	0.682 5
LambdaMART-PDO	0.458 4	0.450 5	0.469 2	0.688 0^?	0.706 1	0.748 8	0.674 4	0.648 8^?	0.710 6^?

模型	概率分布优化	Web30K			Yahoo			Istella
模型	概率分布优化	N@1	N@5	N@10	N@1	N@5	N@10	N@1	N@5	N@10
RankNet	×	0.331 6	0.346 9	0.375 9	0.599 7	0.650 4	0.704 3	0.596 7	0.581 7	0.648 2
RankNet	√	0.406 9^?	0.411 7^?	0.435 2^?	0.623 3^?	0.658 5^?	0.704 7	0.617 7^?	0.591 0^?	0.650 6
LambdaRank	×	0.392 2	0.404 3	0.425 6	0.633 2	0.676 2	0.723 5	0.632 4	0.598 4	0.664 4
LambdaRank	√	0.420 6^?	0.418 0^?	0.440 9^?	0.674 5^?	0.708 5^?	0.757 1^?	0.661 8^?	0.632 3^?	0.682 5^?
LambdaMART	×	0.453 5	0.445 9	0.464 6	0.685 2	0.702 7	0.745 8	0.657 1	0.611 8	0.659 1
LambdaMART	√	0.458 4^?	0.450 5^?	0.469 2^?	0.688 0^?	0.706 1	0.748 8	0.674 4^?	0.648 8^?	0.710 6^?

Probability distribution optimization model for learning to rank

RichHTML

PDF (PC)