JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2026, Vol. 61 ›› Issue (3): 66-74.doi: 10.6040/j.issn.1671-9352.1.2024.061

Previous Articles    

Method for verbose queries reduction by integrating key and latent concepts

ZHU Mingyang1,2, HUANG Yuxin1,2, YU Zhengtao1,2*   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China;
    2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Published:2026-03-18

Abstract: Query reduction aims to enhance retrieval recall and precision by simplifying and condensing lengthy queries while retaining key information. Traditional methods often rely on statistical approaches or pre-trained models to extract keywords from lengthy queries for retrieval input. However, these methods struggle with query complexity(e.g., synonym and polyseme)and often lose crucial information. To address these issues, a method integrating key concepts and latent concepts for verbose query reduction is proposed. This approach integrates key concepts representing the core content of the query with latent concepts crucial for query understanding but not explicitly expressed to generate more comprehensive and effective queries. Specifically, pre-trained models generate concise and effective queries as key concepts, while pseudo-relevance feedback methods extract latent concepts from relevant document sets of the original query. Finally, both are combined to form the query reduction for improved retrieval. Experimental results on the Robust2004 dataset using a dense retrieval model show that the proposed method improves R@1000 and NDCG@10 by 2.1% and 3.6%, respectively, compared to baseline models.

Key words: information retrieval, verbose query, query reduction, key concept, latent concept

CLC Number: 

  • TP391
[1] KIM H, CHOI M, LEE S, et al. ConQueR: contextualized query reduction using search logs[C] //Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. Taipei: ACM, 2023:1899-1903.
[2] CAMPOS R, MANGARAVITE V, PASQUALI A, et al. YAKE! Keyword extraction from single documents using multiple local features[J]. Information Sciences, 2020, 509:257-289.
[3] DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: ACL, 2019:4171-4186.
[4] RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research, 2020, 21(140):1-67.
[5] HUSTON S, CROFT W B. Evaluating verbose query processing techniques[C] //Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Shanghai: ACM, 2010:291-298.
[6] CHAA M, NOUALI O, BELLOT P. New technique to deal with verbose queries in social book search[C] //Proceedings of the International Conference on Web Intelligence. Jinan: IEEE, 2017:799-806.
[7] KUMARAN G, CARVALHO V R. Reducing long queries using query quality predictors[C] //Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Fuji: ACM, 2009:564-571.
[8] ROUSSEAU F, VAZIRGIANNIS M. Main core retention on graph-of-words for single-document keyword extraction[C] //Advances in Information Retrieval: 37th European Conference on IR Research. Vienna: Springer, 2015:382-393.
[9] BOUGOUIN A, BOUDIN F, DAILLE B. Topicrank: graph-based topic ranking for keyphrase extraction[C] //International Joint Conference on Natural Language Processing(IJCNLP). Nagoya: ACL, 2013:543-551.
[10] PODDER D, PAIK J H, MITRA P. Neural language model based attentive term dependence model for verbose query(student abstract)[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Washington: AAAI, 2023:16300-16301.
[11] PRIYANSHU A, VIJAY S. AdaptKeyBERT: an attention-based approach towards few-shot & zero-shot domain adaptation of keybert[EB/OL].(2022-11-16)[2024-09-15]. https://arxiv.org/abs/2211.07499.
[12] VASWANI A. Attention is all you need[C] //Advances in Neural Information Processing Systems. Long Beach: NIPS, 2017:1-15.
[13] KHATTAB O, ZAHARIA M. Colbert: efficient and effective passage search via contextualized late interaction over BERT[C] //Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle: ACM, 2020:39-48.
[14] VOORHEES E M. Overview of the TREC 2004 robust track[C] // Text Retrieval Conference. Washington: NIST, 2004:1-12.
[15] ZHAI C, LAFFERTY J. Model-based feedback in the language modeling approach to information retrieval[C] //Proceedings of the Tenth International Conference on Information and Knowledge Management. Atlanta: ACM, 2001:403-410.
[16] YU H C, XIONG C, CALLAN J. Improving query representations for dense retrieval with pseudo relevance feedback[C] //Proceedings of the 30th ACM International Conference on Information & Knowledge Management. Seattle: ACM, 2021: 3592-3596.
[17] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL].(2015-07-23)[2024-09-15]. https://arxiv.org/abs/1412.6980.
[18] JIN Xiaobo, GENG Guanggang, XIE Guosen, et al. Approximately optimizing NDCG using pair-wise loss[J]. Information Sciences, 2018, 453:50-65.
[19] GROS D, HABERMANN T, KIRSTEIN G, et al. Anaphora resolution: analysing the impact on mean average precision and detecting limitations of automated approaches[J]. International Journal of Information Retrieval Research, 2018, 8(3):33-45.
[1] Fengxu ZHAO,Jian WANG,Yuan LIN,Hongfei LIN. Probability distribution optimization model for learning to rank [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 95-104.
[2] Jia-qi WANG,Mu-yun YANG,Tie-jun ZHAO,Zhen-yu ZHAO. Construction of retrieval dataset of procuratorate legal documents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(7): 81-87.
[3] WANG Kai, HONG Yu, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong. Study on boundary detection of users query intents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 13-18.
[4] CAO Rong, HUANG Jin-zhu, YI Mian-zhu. Information retrieval: the final direction of human language technology research in DARPA [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 11-17.
[5] MENG Ye, ZHANG Peng, SONG Da-wei. Study on collection statistics for parameter selection in pseudo relevance feedback [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(7): 18-22.
[6] LI Sheng-dong, LÜ Xue-qiang, SUN Jun, SHI Shui-cai. Improvement of Lucene full-text indexing efficiency [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 76-79.
[7] XU Jie-ping1, YIN Hong-yu1, FAN Zi-wen2. Study on cover songs identification based on phrase content [J]. J4, 2013, 48(7): 68-71.
[8] SUN Jing-yu, CHEN Jun-jie, YU Xue-li, LI Xian-hua. A survey of collaborative Web search [J]. J4, 2011, 46(5): 9-15.
[9] PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling. A multi-level clustering approach based on noun phrases for search results [J]. J4, 2010, 45(7): 39-44.
[10] WAN Hai-ping,HE Hua-can . Dimensionality reduction based on spectral graph and its application [J]. J4, 2006, 41(3): 58-60 .
[11] WANG Tai-feng,Yuan Ping-bo,JIA Ji-min,Yu Meng-hai . Portrait retrieval based on news environment [J]. J4, 2006, 41(3): 5-10 .
[12] FU Xue-feng,LIU Qiu-yun,WANG Ming-wen . Rough sets information retrieval model based on multual information [J]. J4, 2006, 41(3): 116-119 .
[13] SONG Chun-fang,SHI Bing . An algorithm to cluster the search results basedon the association rules [J]. J4, 2006, 41(3): 61-65 .
[14] HE Jing . An approach to generate boolean query in question andanswering retrieval system [J]. J4, 2006, 41(3): 13-17 .
[15] GAO Xiang,WANG Min . Applying fuzzy cluster algorithm to Web information retrieval [J]. J4, 2006, 41(3): 11-12 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!