山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (9): 13-18.doi: 10.6040/j.issn.1671-9352.0.2016.107
王凯,洪宇*,邱盈盈,王剑,姚建民,周国栋
WANG Kai, HONG Yu*, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong
摘要: 针对一个特定的查询意图,用户往往需要提交多次查询请求。有效地识别连续查询请求之间的意图变化边界,能够帮助检索系统更好地理解用户完整查询意图,以提高查询推荐及查询扩展的效果,并能够辅助个性化检索中用户模型的建立。在充分分析前人研究的有效特征基础上,提出了基于主题相似度检测意图边界的方法,并在SVM及CRF模型上都取得一定的提升。实验结果显示,所提方法的最优性能比Baseline系统F值提高了2%。
中图分类号:
[1] SILVERSTEIN C, MARAIS H, HENZINGER M, et al. Analysis of a very large web search engine query log[J]. SIGIR Forum, 1999, 33(1):6-12. [2] LI Yanan, ZHANG Sen, WANG Bin, et al. Characteristics of chinese web searching: A large-scale analysis of chinese query logs[J]. Journal of Computational Information Systems, 2008, 4(3):1127-1136. [3] 余慧佳, 刘奕群, 张敏,等. 基于大规模日志分析的搜索引擎用户行为分析[J]. 中文信息学报, 2007, 21(1):109-114. YU Huijia, LIU Yiqun, ZHANG Min, et al. Research in search engine user behavior based on log analysis[J]. Journal of Chinese Information Processing, 2007, 21(1):109-114. [4] BRODER A. A taxonomy of web search[J]. SIGIR Forum, 2002, 36(2):3-10. [5] 江雪, 孙乐. 用户查询意图切分的研究[J]. 计算机学报, 2013, 36(3):664-670. JIANG Xue, SUN Le. Study on segmentation of users query intents[J]. Chinese Journal of Computers, 2013, 36(3):664-670. [6] HE Daqing, GÖKER A, HARPER D J. Combining evidence for automatic web session identification[J]. Information Processing & Management, 2002, 38(5):727-742. [7] JANSESN B J, SPINK A, BLAKELY C, et al. Defining a session on web search engines[J]. Journal of the American Society for Information Science and Technology, 2007, 58(6):862-871. [8] DOWNEY D, DUMAIS S, HORVITZ E. Models of searching and browsing: languages, studies, and applications[C] //Proceedings of the International Joint Conference on Artificial Intelligence. Hyderabad: ACM, 2007:1465-1472. [9] NIKOLAI B, BERNARD J B J. Limits of the web log analysis artifacts[C] //Proceedings of Workshop on Logging Traces of Web Activity. Edinburgh:World Wide Web Conference, 2006:152-156. [10] MURRAY G C, LIN J, CHOWDHURY A. Identification of user session with hierarchical agglomerative clustering[J]. Journal of American Society for Information Science, 2006, 43(1):1-9. [11] OZMUTLU H C, CAVDUR F. Application of automatic topic identification on excite web search engine data logs[J]. Information Processing and Management, 2005, 41(5):1243-1262. [12] OZMUTLU S, CAVDUR F. Neural network applications for automatic new topic identification[J]. Online Information Review, 2005, 29(1):34-53. [13] OZMUTLU S, OZMUTLU H C, SPINK A. Automatic new topic identification in search engine transaction logs using multiple linear regression[J].Hawaii International Conference on System Sciences, 2008, 16(3): 140. [14] OZMUTLU S, OZMUTLU H C, BUYUK B. Using Monte-Carlo simulation for automatic new topic identification of search engine transaction logs[J]. Winter Simulation Conference, 2007, 16(5): 2306-2314. [15] LI Xiao, WANG Yeyi, ALEX A. Learning query intent from regularized click graphs[C] //Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.New York: ACM, 2008:339-346. |
[1] | 曹蓉,黄金柱,易绵竹. 信息检索—DARPA人类语言技术研究的最终指向[J]. 山东大学学报(理学版), 2016, 51(9): 11-17. |
[2] | 张文雅,宋大为,张鹏. 面向垂直搜索基于本体的可读性计算模型[J]. 山东大学学报(理学版), 2016, 51(7): 23-29. |
[3] | 孟烨,张鹏,宋大为. 探索数据集特征与伪相关反馈的平衡参数之间的关系[J]. 山东大学学报(理学版), 2016, 51(7): 18-22. |
[4] | 李胜东, 吕学强, 孙军, 施水才. Lucene全文索引效率的改进[J]. 山东大学学报(理学版), 2015, 50(07): 76-79. |
[5] | 许洁萍1,殷宏宇1,范子文2. 基于近似子乐句的翻唱歌曲识别研究[J]. J4, 2013, 48(7): 68-71. |
[6] | 孙静宇,陈俊杰,余雪丽,李鲜花. 协同Web搜索综述[J]. J4, 2011, 46(5): 9-15. |
[7] | 庞观松,张黎莎,蒋盛益*,邝丽敏,吴美玲. 一种基于名词短语的检索结果多层聚类方法[J]. J4, 2010, 45(7): 39-44. |
[8] | 王太峰,袁平波,荚济民,俞能海 . 基于新闻环境的人物肖像检索[J]. J4, 2006, 41(3): 5-10 . |
[9] | 曹 瑛,王明文,陶红亮 . 基于Markov网络的检索模型[J]. J4, 2006, 41(3): 126-130 . |
[10] | 王卫东,宋 丹,宋人杰 . 基于分解的向量空间模型的Web新闻信息检索[J]. J4, 2006, 41(3): 135-138 . |
[11] | 何 靖 . 一种问答式检索系统布尔查询生成方法[J]. J4, 2006, 41(3): 13-17 . |
[12] | 宋春芳,石冰 . 一种基于关联规则的搜索引擎结果聚类算法[J]. J4, 2006, 41(3): 61-65 . |
[13] | 高 翔,王 敏 . 模糊聚类算法在Web信息搜索中的应用[J]. J4, 2006, 41(3): 11-12 . |
[14] | 万海平,何华灿 . 基于谱图的维度约简及其应用[J]. J4, 2006, 41(3): 58-60 . |
[15] | 胡俊刚,董守斌,陈晓志,张元丰 . 基于URL类型优先级的入口页面查询算法[J]. J4, 2006, 41(3): 76-80 . |
|