JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (7): 18-22.doi: 10.6040/j.issn.1671-9352.1.2015.031

Previous Articles     Next Articles

Study on collection statistics for parameter selection in pseudo relevance feedback

MENG Ye, ZHANG Peng, SONG Da-wei   

  1. Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin 300072, China
  • Received:2015-11-14 Online:2016-07-20 Published:2016-07-27

Abstract: Pseudo-relevance feedback(PRF)is an effective technique used to improve the Ad hoc retrieval performance. For PRF methods, how to optimize the balance parameter between the original query model and feedback model is an important but difficult problem. In the current feedback methods, the balance parameter is often set to a fixed value across all collections. However, due to the difference among collections, this parameter should be tuned differently. In this paper, we aim to discover some meaningful clues for the optimization of the balance parameter through analyzing the statistical features of collections. We investigates the dependency between the optimal parameter and a number of collection statistics, including the standard deviation of document length(Dev(dl)), the proportion of low frequency terms in the collection(LFT-C)and in the expansion terms. The experiments on six TREC collections demonstrate that the higher LFT-C and Dev(dl)are, the bigger weight of the original query model should be given.

Key words: information retrieval, pseudo-relevance feedback, collection characteristics

CLC Number: 

  • TP393
[1] CAO G, NIE J Y, GAO J, et al. Selecting good expansion terms for pseudo-relevance feedback[C] //Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2008:243-250.
[2] CLINCHANT S, GAUSSIER E. A theoretical analysis of pseudo-relevance feedback models[C] //Proceeding of the 2013 Coference on the Theory of Information Retrieval.[S.l.] :[s.n.]. 2013: 6.
[3] METZLER D, CROFT W B. Latent concept expansion using markov random fields[C] //Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2007: 311-318.
[4] XU J, CROFT W B. Query expansion using local and global document analysis[C] //Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1996: 4-11.
[5] COLLINS-THOMPSON K. Accounting for stability of retrieval algorithms using risk-reward curves[C] //Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. New York: ACM, 2009: 27-28.
[6] LV Y, ZHAI C X. Adaptive relevance feedback in information retrieval[C] //Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 255-264.
[7] LAVRENKO V, CROFT W B. Relevance based language models[C] //Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2001: 120-127.
[8] ZHANG P, SONG D, ZHAO X, et al. A study of document weight smoothness in pseudo relevance feedback[J]. Information Retrieval Technology, 2010, 6458:527-538.
[9] YE Z, HUANG J X. A simple term frequency transformation model for effective pseudo relevance feedback[C] //Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2014: 323-332.
[10] ALLAN J, CONNELL M E, CROFT W B, et al. Inquery and trec-9[C] //Proceedings of the 9th Text Retrieval Conference(TREC-9).[S.l.] :[s.n.]. 2000:551-562.
[11] PORTER M F. An algorithm for suffix stripping[J]. Program: Electronic Library and Information Systems, 1980, 14(3):130-137.
[12] OGILVIE P, CALLAN J P. Experiments using the lemur toolkit[J]. TREC, 2001, 10:103-108.
[13] ZHAI C, LAFFERTY J. A study of smoothing methods for language models applied to ad hoc information retrieval[C] //Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2001: 334-342.
[1] WANG Kai, HONG Yu, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong. Study on boundary detection of users query intents [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 13-18.
[2] CAO Rong, HUANG Jin-zhu, YI Mian-zhu. Information retrieval: the final direction of human language technology research in DARPA [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(9): 11-17.
[3] LI Sheng-dong, LÜ Xue-qiang, SUN Jun, SHI Shui-cai. Improvement of Lucene full-text indexing efficiency [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 76-79.
[4] XU Jie-ping1, YIN Hong-yu1, FAN Zi-wen2. Study on cover songs identification based on phrase content [J]. J4, 2013, 48(7): 68-71.
[5] SUN Jing-yu, CHEN Jun-jie, YU Xue-li, LI Xian-hua. A survey of collaborative Web search [J]. J4, 2011, 46(5): 9-15.
[6] PANG Guan-song, ZHANG Li-sha, JIANG Sheng-yi*, KUANG Li-min, WU Mei-ling. A multi-level clustering approach based on noun phrases for search results [J]. J4, 2010, 45(7): 39-44.
[7] WANG Tai-feng,Yuan Ping-bo,JIA Ji-min,Yu Meng-hai . Portrait retrieval based on news environment [J]. J4, 2006, 41(3): 5-10 .
[8] CAO Ying,WANG Ming-wen,TAO Hong-liang . Information retrieval model based on Markov Network [J]. J4, 2006, 41(3): 126-130 .
[9] WANG Wei-dong,SONG Dan,SONG Ren-jie . Web news retrieval based on splited vector space model [J]. J4, 2006, 41(3): 135-138 .
[10] HE Jing . An approach to generate boolean query in question andanswering retrieval system [J]. J4, 2006, 41(3): 13-17 .
[11] SONG Chun-fang,SHI Bing . An algorithm to cluster the search results basedon the association rules [J]. J4, 2006, 41(3): 61-65 .
[12] GAO Xiang,WANG Min . Applying fuzzy cluster algorithm to Web information retrieval [J]. J4, 2006, 41(3): 11-12 .
[13] WAN Hai-ping,HE Hua-can . Dimensionality reduction based on spectral graph and its application [J]. J4, 2006, 41(3): 58-60 .
[14] HU Jungang,DONG Shou-bin,CHEN Xiao-zhi,ZHANG Yuan-feng . Entry page search algorithm based on URLtype prior probabilities [J]. J4, 2006, 41(3): 76-80 .
[15] FU Xue-feng,LIU Qiu-yun,WANG Ming-wen . Rough sets information retrieval model based on multual information [J]. J4, 2006, 41(3): 116-119 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!