山东大学学报(理学版) ›› 2016, Vol. 51 ›› Issue (7): 18-22.doi: 10.6040/j.issn.1671-9352.1.2015.031
孟烨,张鹏,宋大为
MENG Ye, ZHANG Peng, SONG Da-wei
摘要: 伪相关反馈(pseudo-relevance feedback)是一种可有效提升查询性能的查询扩展技术。对这项技术而言,如何选取参数来平衡原始查询和扩展词的比重以达到最优的查询效果是一个非常重要的问题。在以往的反馈模型中,该平衡参数在所有数据集上需要设置成固定的经验值。但是,由于数据集之间的差异性,该平衡参数应该随着数据集的变化而改变。通过分析数据集的统计特征来发掘其与最优平衡参数之间的关系,进而指导最优参数的选择,主要分析了文档长度离散度、低频词项在数据集和查询扩展词中的比重等特征。通过分析在6个标准TREC数据集上的实验结果得出结论:特殊词项的比例越高,文档长度离散度越大,越需要给原始查询更大的比重。
中图分类号:
[1] CAO G, NIE J Y, GAO J, et al. Selecting good expansion terms for pseudo-relevance feedback[C] //Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2008:243-250. [2] CLINCHANT S, GAUSSIER E. A theoretical analysis of pseudo-relevance feedback models[C] //Proceeding of the 2013 Coference on the Theory of Information Retrieval.[S.l.] :[s.n.]. 2013: 6. [3] METZLER D, CROFT W B. Latent concept expansion using markov random fields[C] //Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2007: 311-318. [4] XU J, CROFT W B. Query expansion using local and global document analysis[C] //Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1996: 4-11. [5] COLLINS-THOMPSON K. Accounting for stability of retrieval algorithms using risk-reward curves[C] //Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation. New York: ACM, 2009: 27-28. [6] LV Y, ZHAI C X. Adaptive relevance feedback in information retrieval[C] //Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 255-264. [7] LAVRENKO V, CROFT W B. Relevance based language models[C] //Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2001: 120-127. [8] ZHANG P, SONG D, ZHAO X, et al. A study of document weight smoothness in pseudo relevance feedback[J]. Information Retrieval Technology, 2010, 6458:527-538. [9] YE Z, HUANG J X. A simple term frequency transformation model for effective pseudo relevance feedback[C] //Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2014: 323-332. [10] ALLAN J, CONNELL M E, CROFT W B, et al. Inquery and trec-9[C] //Proceedings of the 9th Text Retrieval Conference(TREC-9).[S.l.] :[s.n.]. 2000:551-562. [11] PORTER M F. An algorithm for suffix stripping[J]. Program: Electronic Library and Information Systems, 1980, 14(3):130-137. [12] OGILVIE P, CALLAN J P. Experiments using the lemur toolkit[J]. TREC, 2001, 10:103-108. [13] ZHAI C, LAFFERTY J. A study of smoothing methods for language models applied to ad hoc information retrieval[C] //Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2001: 334-342. |
[1] | 王凯,洪宇,邱盈盈,王剑,姚建民,周国栋. 一种查询意图边界检测方法研究[J]. 山东大学学报(理学版), 2017, 52(9): 13-18. |
[2] | 曹蓉,黄金柱,易绵竹. 信息检索—DARPA人类语言技术研究的最终指向[J]. 山东大学学报(理学版), 2016, 51(9): 11-17. |
[3] | 张文雅,宋大为,张鹏. 面向垂直搜索基于本体的可读性计算模型[J]. 山东大学学报(理学版), 2016, 51(7): 23-29. |
[4] | 李胜东, 吕学强, 孙军, 施水才. Lucene全文索引效率的改进[J]. 山东大学学报(理学版), 2015, 50(07): 76-79. |
[5] | 许洁萍1,殷宏宇1,范子文2. 基于近似子乐句的翻唱歌曲识别研究[J]. J4, 2013, 48(7): 68-71. |
[6] | 孙静宇,陈俊杰,余雪丽,李鲜花. 协同Web搜索综述[J]. J4, 2011, 46(5): 9-15. |
[7] | 庞观松,张黎莎,蒋盛益*,邝丽敏,吴美玲. 一种基于名词短语的检索结果多层聚类方法[J]. J4, 2010, 45(7): 39-44. |
[8] | 王太峰,袁平波,荚济民,俞能海 . 基于新闻环境的人物肖像检索[J]. J4, 2006, 41(3): 5-10 . |
[9] | 曹 瑛,王明文,陶红亮 . 基于Markov网络的检索模型[J]. J4, 2006, 41(3): 126-130 . |
[10] | 王卫东,宋 丹,宋人杰 . 基于分解的向量空间模型的Web新闻信息检索[J]. J4, 2006, 41(3): 135-138 . |
[11] | 何 靖 . 一种问答式检索系统布尔查询生成方法[J]. J4, 2006, 41(3): 13-17 . |
[12] | 宋春芳,石冰 . 一种基于关联规则的搜索引擎结果聚类算法[J]. J4, 2006, 41(3): 61-65 . |
[13] | 高 翔,王 敏 . 模糊聚类算法在Web信息搜索中的应用[J]. J4, 2006, 41(3): 11-12 . |
[14] | 万海平,何华灿 . 基于谱图的维度约简及其应用[J]. J4, 2006, 41(3): 58-60 . |
[15] | 胡俊刚,董守斌,陈晓志,张元丰 . 基于URL类型优先级的入口页面查询算法[J]. J4, 2006, 41(3): 76-80 . |
|