您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (9): 13-18.doi: 10.6040/j.issn.1671-9352.0.2016.107

• • 上一篇    下一篇

一种查询意图边界检测方法研究

王凯,洪宇*,邱盈盈,王剑,姚建民,周国栋   

  1. 苏州大学计算机科学与技术学院, 江苏 苏州 215006
  • 收稿日期:2016-11-25 出版日期:2017-09-20 发布日期:2017-09-15
  • 作者简介:王凯(1990— ),男,硕士,研究方向为信息检索和信息抽取. E-mail: wangkainlp@gmail.com*通迅作者: 洪宇(1978— ),男,副教授,研究方向为话题检测、信息检索和信息抽取. E-mail: tianxianer@gmail.com
  • 基金资助:
    国家自然科学基金资助项目(61672368,61672367,61373097,61331011)

Study on boundary detection of users query intents

WANG Kai, HONG Yu*, QIU Ying-ying, WANG Jian, YAO Jian-min, ZHOU Guo-dong   

  1. School of Computer Science and Technology Soochow University, Suzhou 215006, Jiangsu, China
  • Received:2016-11-25 Online:2017-09-20 Published:2017-09-15

摘要: 针对一个特定的查询意图,用户往往需要提交多次查询请求。有效地识别连续查询请求之间的意图变化边界,能够帮助检索系统更好地理解用户完整查询意图,以提高查询推荐及查询扩展的效果,并能够辅助个性化检索中用户模型的建立。在充分分析前人研究的有效特征基础上,提出了基于主题相似度检测意图边界的方法,并在SVM及CRF模型上都取得一定的提升。实验结果显示,所提方法的最优性能比Baseline系统F值提高了2%。

关键词: 查询意图, 边界检测, 信息检索

Abstract: In generally, several query requests will be submit by user to capture specific query intent. It is quite a meaningful work to detect the boundary among continuous query requests effectively, which could help search engine to understand the query intent completely. Moreover, identifying the integrated query intent is considerable helpful to query suggestion, query expansion and the construction of user profile. On the basis of fully analyzing the features mentioned from previous research, this paper proposed topic distribution-based similarity and this similarity is effective with SVM model and CRF model. The results show that, with topic distribution similarity, F-measure is improved by 2% in comparison to the baseline system.

Key words: information retrieval, boundary detection, query intent

中图分类号: 

  • TP391
[1] SILVERSTEIN C, MARAIS H, HENZINGER M, et al. Analysis of a very large web search engine query log[J]. SIGIR Forum, 1999, 33(1):6-12.
[2] LI Yanan, ZHANG Sen, WANG Bin, et al. Characteristics of chinese web searching: A large-scale analysis of chinese query logs[J]. Journal of Computational Information Systems, 2008, 4(3):1127-1136.
[3] 余慧佳, 刘奕群, 张敏,等. 基于大规模日志分析的搜索引擎用户行为分析[J]. 中文信息学报, 2007, 21(1):109-114. YU Huijia, LIU Yiqun, ZHANG Min, et al. Research in search engine user behavior based on log analysis[J]. Journal of Chinese Information Processing, 2007, 21(1):109-114.
[4] BRODER A. A taxonomy of web search[J]. SIGIR Forum, 2002, 36(2):3-10.
[5] 江雪, 孙乐. 用户查询意图切分的研究[J]. 计算机学报, 2013, 36(3):664-670. JIANG Xue, SUN Le. Study on segmentation of users query intents[J]. Chinese Journal of Computers, 2013, 36(3):664-670.
[6] HE Daqing, GÖKER A, HARPER D J. Combining evidence for automatic web session identification[J]. Information Processing & Management, 2002, 38(5):727-742.
[7] JANSESN B J, SPINK A, BLAKELY C, et al. Defining a session on web search engines[J]. Journal of the American Society for Information Science and Technology, 2007, 58(6):862-871.
[8] DOWNEY D, DUMAIS S, HORVITZ E. Models of searching and browsing: languages, studies, and applications[C] //Proceedings of the International Joint Conference on Artificial Intelligence. Hyderabad: ACM, 2007:1465-1472.
[9] NIKOLAI B, BERNARD J B J. Limits of the web log analysis artifacts[C] //Proceedings of Workshop on Logging Traces of Web Activity. Edinburgh:World Wide Web Conference, 2006:152-156.
[10] MURRAY G C, LIN J, CHOWDHURY A. Identification of user session with hierarchical agglomerative clustering[J]. Journal of American Society for Information Science, 2006, 43(1):1-9.
[11] OZMUTLU H C, CAVDUR F. Application of automatic topic identification on excite web search engine data logs[J]. Information Processing and Management, 2005, 41(5):1243-1262.
[12] OZMUTLU S, CAVDUR F. Neural network applications for automatic new topic identification[J]. Online Information Review, 2005, 29(1):34-53.
[13] OZMUTLU S, OZMUTLU H C, SPINK A. Automatic new topic identification in search engine transaction logs using multiple linear regression[J].Hawaii International Conference on System Sciences, 2008, 16(3): 140.
[14] OZMUTLU S, OZMUTLU H C, BUYUK B. Using Monte-Carlo simulation for automatic new topic identification of search engine transaction logs[J]. Winter Simulation Conference, 2007, 16(5): 2306-2314.
[15] LI Xiao, WANG Yeyi, ALEX A. Learning query intent from regularized click graphs[C] //Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.New York: ACM, 2008:339-346.
[1] 曹蓉,黄金柱,易绵竹. 信息检索—DARPA人类语言技术研究的最终指向[J]. 山东大学学报(理学版), 2016, 51(9): 11-17.
[2] 张文雅,宋大为,张鹏. 面向垂直搜索基于本体的可读性计算模型[J]. 山东大学学报(理学版), 2016, 51(7): 23-29.
[3] 孟烨,张鹏,宋大为. 探索数据集特征与伪相关反馈的平衡参数之间的关系[J]. 山东大学学报(理学版), 2016, 51(7): 18-22.
[4] 李胜东, 吕学强, 孙军, 施水才. Lucene全文索引效率的改进[J]. 山东大学学报(理学版), 2015, 50(07): 76-79.
[5] 许洁萍1,殷宏宇1,范子文2. 基于近似子乐句的翻唱歌曲识别研究[J]. J4, 2013, 48(7): 68-71.
[6] 孙静宇,陈俊杰,余雪丽,李鲜花. 协同Web搜索综述[J]. J4, 2011, 46(5): 9-15.
[7] 庞观松,张黎莎,蒋盛益*,邝丽敏,吴美玲. 一种基于名词短语的检索结果多层聚类方法[J]. J4, 2010, 45(7): 39-44.
[8] 王太峰,袁平波,荚济民,俞能海 . 基于新闻环境的人物肖像检索[J]. J4, 2006, 41(3): 5-10 .
[9] 曹 瑛,王明文,陶红亮 . 基于Markov网络的检索模型[J]. J4, 2006, 41(3): 126-130 .
[10] 王卫东,宋 丹,宋人杰 . 基于分解的向量空间模型的Web新闻信息检索[J]. J4, 2006, 41(3): 135-138 .
[11] 何 靖 . 一种问答式检索系统布尔查询生成方法[J]. J4, 2006, 41(3): 13-17 .
[12] 宋春芳,石冰 . 一种基于关联规则的搜索引擎结果聚类算法[J]. J4, 2006, 41(3): 61-65 .
[13] 高 翔,王 敏 . 模糊聚类算法在Web信息搜索中的应用[J]. J4, 2006, 41(3): 11-12 .
[14] 万海平,何华灿 . 基于谱图的维度约简及其应用[J]. J4, 2006, 41(3): 58-60 .
[15] 胡俊刚,董守斌,陈晓志,张元丰 . 基于URL类型优先级的入口页面查询算法[J]. J4, 2006, 41(3): 76-80 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!