JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (7): 95-104.doi: 10.6040/j.issn.1671-9352.1.2023.026

• Review • Previous Articles     Next Articles

Probability distribution optimization model for learning to rank

Fengxu ZHAO1(),Jian WANG1,Yuan LIN2,*(),Hongfei LIN1   

  1. 1. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
    2. School of Public Administration and Policy, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2023-10-18 Online:2024-07-20 Published:2024-07-15
  • Contact: Yuan LIN E-mail:22209214@mail.dlut.edu.cn;zhlin@dlut.edu.cn

Abstract:

Existing learning to rank (LTR) models rely on the scores output by models to represent the partial order among documents. Considering the limitation of treating scores as deterministic values, this paper proposes a probability distribution optimization method for the LTR model, which introduces the uncertainty of the ranking score. It smooths the scores in the form of probability distributions, thereby transforming the comparison of ranking scores into the probability estimation of score partial orders. The proposed method is applied to LTR models such as RankNet, LambdaRank, and LambdaMART. It effectively bridges the gap between the modeled probability and the target probability, leading to optimization of the LTR models. The paper conducts experiments on multiple large-scale real datasets, and the experimental results show that the optimized models outperform the original ones, which validates the effectiveness of the proposed method.

Key words: information retrieval, learning to rank, probability distribution

CLC Number: 

  • TP391

Fig.1

Schematic diagram of Gaussian integral probability calculation"

Fig.2

Comparison chart of Sigmoid and Gaussian integral probability for modeled probability calculation"

Table 1

Statistics information of Web30K, Yahoo, and Istella"

数据集 特征数 查询数 文档数
训练集 验证集 测试集 训练集 验证集 测试集
Web30K 136 18 919 6 306 6 306 2 270 296 747 218 753 611
Yahoo 700 19 944 2 994 6 983 473 134 71 083 165 660
Istella 220 20 901 2 318 9 799 6 587 822 737 803 3 129 004

Fig.3

The impact of different σs values on model performance on the Yahoo validation set"

Fig.4

The impact of different σs values on modeled probability"

Table 2

Comparative experimental results on the Web30K, Yahoo, and Istella test sets"

模型 Web30K Yahoo Istella
N@1 N@5 N@10 N@1 N@5 N@10 N@1 N@5 N@10
RankSVM 0.301 0 0.335 0 0.365 0 0.637 0 0.674 0 0.726 0 0.526 9 0.504 1 0.552 9
RankBoost 0.278 8 0.304 3 0.333 9 0.629 3 0.666 1 0.715 9 0.445 7 0.409 7 0.451 1
LambdaMART 0.453 5 0.445 9 0.464 6 0.685 2 0.702 7 0.745 8 0.657 1 0.611 8 0.659 1
GSF 0.412 9 0.415 1 0.437 4 0.642 9 0.683 8 0.731 6 0.622 4 0.596 8 0.650 8
DLCM 0.463 0 0.450 0 0.469 0 0.677 0 0.699 0 0.743 0 0.655 8 0.619 4 0.668 0
SetRank 0.429 0 0.422 0 0.442 8 0.671 1 0.696 0 0.739 8 0.673 3 0.627 8 0.673 7
SetRank-re 0.459 1 0.451 5 0.469 6 0.682 2 0.702 9 0.745 3 0.676 0 0.634 5 0.683 4
RankNet-PDO 0.406 9 0.411 7 0.435 2 0.623 3 0.658 5 0.704 7 0.617 7 0.591 0 0.650 6
LambdaRank-PDO 0.420 6 0.418 0 0.440 9 0.674 5 0.708 5? 0.757 1 0.661 8 0.632 3 0.682 5
LambdaMART-PDO 0.458 4 0.450 5 0.469 2 0.688 0? 0.706 1 0.748 8 0.674 4 0.648 8? 0.710 6?

Table 3

Probability distribution optimization on ablation experimental results of learning to rank models"

模型 概率分布优化 Web30K Yahoo Istella
N@1 N@5 N@10 N@1 N@5 N@10 N@1 N@5 N@10
RankNet × 0.331 6 0.346 9 0.375 9 0.599 7 0.650 4 0.704 3 0.596 7 0.581 7 0.648 2
0.406 9? 0.411 7? 0.435 2? 0.623 3? 0.658 5? 0.704 7 0.617 7? 0.591 0? 0.650 6
LambdaRank × 0.392 2 0.404 3 0.425 6 0.633 2 0.676 2 0.723 5 0.632 4 0.598 4 0.664 4
0.420 6? 0.418 0? 0.440 9? 0.674 5? 0.708 5? 0.757 1? 0.661 8? 0.632 3? 0.682 5?
LambdaMART × 0.453 5 0.445 9 0.464 6 0.685 2 0.702 7 0.745 8 0.657 1 0.611 8 0.659 1
0.458 4? 0.450 5? 0.469 2? 0.688 0? 0.706 1 0.748 8 0.674 4? 0.648 8? 0.710 6?
1 CHAPELLEO,WUM R.Gradient descent optimization of smoothed information retrieval metrics[J].Information Retrieval,2010,13(3):216-235.
2 QINTao,LIUTieyan,LIHang.A general approximation framework for direct optimization of information retrieval measures[J].Information Retrieval,2010,13(4):375-397.
3 TAYLOR M, GUIVER J, ROBERTSON S, et al. SoftRank: optimizing non-smooth rank metrics[C]//Proceedings of the 2008 International Conference on Web Search and Data Minin. Palo Alto: ACM, 2008: 77-86.
4 VALIZADEGAN H, JIN R, ZHANG R F, et al. Learning to rank by optimizing NDCG measure[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems. Vancouve: ACM, 2009: 1883-1891.
5 BRUCH S. An alternative cross entropy loss for learning-to-rank[C]//Proceedings of the Web Conference 2021. Ljubljana: ACM, 2021: 118-126.
6 BRUCH S, WANG X H, BENDERSKY M, et al. An analysis of the softmax cross entropy loss for learning-to-rank with binary relevance[C]//Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. Santa Clara: ACM, 2019: 75-78.
7 BURGES C, SHAKED T, RENSHAW E, et al. Learning to rank using gradient descent[C]//Proceedings of the 22nd International Conference on Machine Learning. Bonn: ACM, 2005: 89-96.
8 BURGES C J C, RAGNO R, LE Q V. Learning to rank with nonsmooth cost functions[C]//Proceedings of the 19th International Conference on Neural Information Processing Systems. Berlin: Springer, 2006: 193-200.
9 BURGESC J C.From RankNet to LambdaRank to LambdaMART: an overview[J].Learning,2010,11(81):23-581.
10 SUN Zhengya, QIN Tao, TAO Qing, et al. Robust sparse rank learning for non-smooth ranking measures[C]//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston: ACM, 2009: 259-266.
11 CHAPELLE O, CHANG Y. Yahoo! learning to rank challenge overview[C]//Proceedings of the 2010 International Conference on Yahoo! Learning to Rank Challenge. Haifa: ACM, 2010: 1-24.
12 NALLAPATI R. Discriminative models for information retrieval[C]//Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield: ACM, 2004: 64-71.
13 LI P, BURGES C, WU Q. McRank: learning to rank using multiple classification and gradient boosting[C]//Proceedings of Advances in Neural Information Processing Systems 20 (NIPS 2007). Columbia: Curran Associates, Inc., 2007: 845-852.
14 JOACHIMS T. Optimizing search engines using clickthrough data[C]//Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton: ACM, 2002: 133-142.
15 BURGES C, SHAKED T, RENSHAW E, et al. Learning to rank using gradient descent[C]//Proceedings of the 22nd International Conference on Machine Learning. Bonn: ACM, 2005: 89-96.
16 FREUNDY,IYERR,SCHAPIRER E,et al.An efficient boosting algorithm for combining preferences[J].Journal of Machine Learning Research,2003,4(6):933-969.
17 XU Jun, LI Hang. AdaRank: a boosting algorithm for information retrieval[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam: ACM, 2007: 391-398.
18 CAO Zhe, QIN Tao, LIU Tieyan, et al. Learning to rank: from pairwise approach to listwise approach[C]//Proceedings of the 24th International Conference on Machine Learning. Corvalis: ACM, 2007: 129-136.
19 QINT,ZHANGX D,TSAIM F,et al.Query-level loss functions for information retrieval[J].Information Processing & Management,2008,44(2):838-855.
20 AI Q Y, WANG X H, BRUCH S, et al. Learning groupwise multivariate scoring functions using deep neural networks[C]//Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. Santa Clara: ACM, 2019: 85-92.
21 QIN Zhen, YAN Le, ZHUANG Honglei, et al. Are neural rankers still outperformed by gradient boosted decision trees?[C/OL]//International Conference on Learning Representations. 2021: 1-16. https://openreview.net/pdf?id=Ut1vF_q_vC.
22 AI Qingyao Y, BI Keping, GUO Jiafeng, et al. Learning a deep listwise context model for ranking refinement[C]//SIGIR'18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. Ann Arbor: ACM, 2018: 135-144.
23 BRUCH S, ZOGHI M, BENDERSKY M, et al. Revisiting approximate metric optimization in the age of deep neural networks[C]//Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Paris: ACM, 2019: 124-1244.
24 PANG Liang, XU Jun, AI Qingyao, et al. Setrank: learning a permutation-invariant ranking model for information retrieval[C]//SIGIR'20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020: 499-508.
25 QIN Tao, LIU Tieyan. Introducing LETOR 4.0 datasets[EB/OL]. (2013-01-09)[2023-10-18]. http://arxiv.org/abs/1306.2597.
26 DATOD,LUCCHESEC,NARDINIF M,et al.Fast ranking with additive ensembles of oblivious and non-oblivious regression trees[J].ACM Transactions on Information Systems (TOIS),2016,35(2):1-31.
27 WANG X H, LI C, GOLBANDI N, et al. The LambdaLoss framework for ranking metric optimization[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. Torino: ACM, 2018: 1313-1322.
[1] ZHAO Ya-qi, REN Fang-guo. Representation of matrices and its applications to entropy [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(11): 97-104.
[2] YANG Guang-bo, WANG Cai-shi, LUO Yan, WANG Yan-yan, NAN Xue-qi. 2-Tessellable staggered quantum walk on finite even cycles [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(1): 52-59.
[3] Jia-qi WANG,Mu-yun YANG,Tie-jun ZHAO,Zhen-yu ZHAO. Construction of retrieval dataset of procuratorate legal documents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(7): 81-87.
[4] WANG Kai, HONG Yu, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong. Study on boundary detection of users query intents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 13-18.
[5] CAO Rong, HUANG Jin-zhu, YI Mian-zhu. Information retrieval: the final direction of human language technology research in DARPA [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 11-17.
[6] MENG Ye, ZHANG Peng, SONG Da-wei. Study on collection statistics for parameter selection in pseudo relevance feedback [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(7): 18-22.
[7] LI Sheng-dong, LÜ Xue-qiang, SUN Jun, SHI Shui-cai. Improvement of Lucene full-text indexing efficiency [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 76-79.
[8] XU Jie-ping1, YIN Hong-yu1, FAN Zi-wen2. Study on cover songs identification based on phrase content [J]. J4, 2013, 48(7): 68-71.
[9] SUN Jing-yu, CHEN Jun-jie, YU Xue-li, LI Xian-hua. A survey of collaborative Web search [J]. J4, 2011, 46(5): 9-15.
[10] PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling. A multi-level clustering approach based on noun phrases for search results [J]. J4, 2010, 45(7): 39-44.
[11] WAN Hai-ping,HE Hua-can . Dimensionality reduction based on spectral graph and its application [J]. J4, 2006, 41(3): 58-60 .
[12] WANG Tai-feng,Yuan Ping-bo,JIA Ji-min,Yu Meng-hai . Portrait retrieval based on news environment [J]. J4, 2006, 41(3): 5-10 .
[13] FU Xue-feng,LIU Qiu-yun,WANG Ming-wen . Rough sets information retrieval model based on multual information [J]. J4, 2006, 41(3): 116-119 .
[14] SONG Chun-fang,SHI Bing . An algorithm to cluster the search results basedon the association rules [J]. J4, 2006, 41(3): 61-65 .
[15] HE Jing . An approach to generate boolean query in question andanswering retrieval system [J]. J4, 2006, 41(3): 13-17 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] JIN Li-ming,YANG Yan*,LIU Wan-shun,HAN Bao-qin,TIAN Wen-jie,FAN Sheng-di . Protective effects of chitosan oligosaccharide and its derivatives on carbon tetrachloride-induced liver damage in mice[J]. J4, 2007, 42(7): 1 -04 .
[2] ZHANG Dong-qing, YIN Xiao-bin, GAO Han-peng. Quasi-linearly Armendariz modules[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 1 -6 .
[3] LI Ya-nan1, LIU Lei-po2, WANG Yu-guang3. Passive sliding mode control for uncertain time-delay systems subjected to input nonlinearity[J]. J4, 2010, 45(6): 99 -104 .
[4] ZHANG Sumei, MA Qiaoling, ZHAO Haixia. (d,1)Total labeling of the product of path and cycle graph[J]. J4, 2009, 44(4): 37 -42 .
[5] SU Qi,XIANG Kun and SUN Bin . The Shark-Search algorithm based on clustering links[J]. J4, 2006, 41(3): 1 -04 .
[6] QU Xiao-ying ,ZHAO Jing . Solution of the Klein-Gordon equation for the time-dependent potential[J]. J4, 2007, 42(7): 22 -26 .
[7] WANG Guang-chen . LQ nonzero sum stochastic differential game under partial observable information[J]. J4, 2007, 42(6): 12 -15 .
[8] DENG Yong,DING Long-yun . Two-side pseudo-Euclidean rings and the normal form of matrices on them[J]. J4, 2007, 42(9): 114 -118 .
[9] ZHANG Liang,WANG Hai-mei,HUANG He-yan,ZHANG Xiao-fei . Chinese question answering systemoriented Chinese parsing[J]. J4, 2006, 41(3): 30 -33 .
[10] YANG Jun. Characterization and structural control of metalbased nanomaterials[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 1 -22 .