您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (01): 31-36.doi: 10.6040/j.issn.1671-9352.3.2014.033

• 论文 • 上一篇    下一篇

数据融合在搜索结果多元化上的应用

黄春兰, 吴胜利   

  1. 江苏大学计算机科学与通信工程学院, 江苏 镇江 212013
  • 收稿日期:2014-09-05 修回日期:2014-11-25 出版日期:2015-01-20 发布日期:2015-01-24
  • 通讯作者: 吴胜利(1963-),男,教授,研究方向为数据库与信息系统、机器学习.E-mail:swu@ujs.edu.cn E-mail:swu@ujs.edu.cn
  • 作者简介:黄春兰(1990-),女,硕士研究生,研究方向为信息检索.E-mail:palaceo77@163.com
  • 基金资助:
    江苏特聘教授项目(1221170037, 1221170038);江苏大学特聘教授启动基金资助项目(1281170024, 1281170025)

Search result diversification via data fusion

HUANG Chun-lan, WU Sheng-li   

  1. School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang 212013, Jiangsu, China
  • Received:2014-09-05 Revised:2014-11-25 Online:2015-01-20 Published:2015-01-24

摘要: 信息检索系统不仅需要考虑文档的相关性,还要考虑文档的多样性和新颖性。针对信息检索结果的多元化问题,探讨了数据融合方法在搜索结果多元化上的适用性。针对线性组合方法,重新考察了成员系统的权重分配策略。通过考虑成员检索系统的有效性和成员检索系统之间的差异性,提出了一种比较简单方便的基于集合覆盖率的方法,使得采用这种权重分配方式的线性组合方法在结果的多样性上能够有所改善。实验采用了3组来自于TREC文本检索会议的针对Web检索多样化任务的数据,实验结果表明在多样性方面,所提出的数据融合方法均能提高检索结果的性能,优于最佳的成员检索系统。

关键词: 权重分配, 数据融合, 线性组合, 检索结果多元化

Abstract: Information retrieval systems need to consider both aspects of relevance and diversity for those retrieved documents. To solve the problem of search result diversification, a different perspective was adopted to solve the problem based on a discussion of the application of data fusion method in the search result diversification. Especially for the linear combination method, the weight allocation strategy for component systems was reexamined. Both the effectiveness of component retrieval systems and the dissimilarity of them were concerned, and a simple and convenient method for calculating the dissimilarity was put forward, based on set covering rate. Thereby a linear combination method with such weighting assignment can improve the performance of results in the diversity. Experiments were carried out with 3 groups of top-ranked results submitted to the TREC web diversity task. The result of experiments shows that data fusion is still a useful approach to performance improvement for diversity as for relevance previously.

Key words: weight assignment, search result diversification, data fusion, linear combination

中图分类号: 

  • TP391
[1] AGRAWAL R, GOLLAPUDI S, HALVERSON A, et al. Diversifying search results[C]// Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. New York: ACM, 2009: 5-14.
[2] CARBONELL J, GOLDSTEIN J. The use of MMR, diversity-based re-ranking for reordering documents and producing summaries[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1998: 335-336.
[3] WANG Jun, ZHU Jianhan. Portfolio theory of information retrieval[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009: 115-122.
[4] ZHAI Chengxiang, COHEN William W, LAFFERTY J. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003: 10-17.
[5] DANG Van, CROFT W B. Diversity by proportionality: an election-based approach to search result diversification[C]// Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2012: 65-74.
[6] SANTOS R L T, MACDONALD C, OUNIS I. Exploiting query reformulations for web search result diversification[C]// Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010: 881-890.
[7] DANG V, CROFT B W. Term level search result diversification[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 603-612.
[8] AKTOLGA E, ALLAN J. Sentiment diversification with different biases[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 593-602.
[9] NGUYEN T N, KANHABUA N. Leveraging dynamic query subtopics for time-aware search result diversification[M]// Advances in Information Retrieval. New York: Springer International Publishing, 2014: 222-234.
[10] YIN Xiaoshi, HUANG J X, LI Zhoujun, et al. A survival modeling approach to biomedical search result diversification using Wikipedia[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(6):1201-1212.
[11] SAKAI T, DOU Zhicheng, YAMAMOTO T, et al. Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 761-764.
[12] ZHENG Wei, FANG Hui, YAO Conglei, et al. Leveraging integrated information to extract query subtopics for search result diversification[J]. Information Retrieval, 2014, 17(1):52-73.
[13] ZHENG Wei, FANG Hui. A diagnostic study of search result diversification methods[C]// Proceedings of the 2013 Conference on the Theory of Information Retrieval. New York: ACM, 2013: 17.
[14] LEE J H. Analyses of multiple evidence combination[C]// Proceedings of the 20th Annual International ACM SIGIR Conference. New York: ACM, 1997, 31(SI):267-276.
[15] WU Shengli, BI Yaxin, ZENG Xiaoqin. The linear combination data fusion method in information retrieval[J]. Lecture Notes in Computer Science, 2011, 6861:219-233.
[16] WU Shengli, MCCLEAN S. Performance prediction of data fusion for information retrieval[J]. Information Processing & Management, 2006, 42(4):899-915.
[17] CLARKE C L A, CRASWELL N, SOBOROFF I, et al. Overview of the TREC 2011 web track[C]// Proceedings of TREC Conference. Gaithersburg:[s.n.]. 2011: 1-9.
[18] CORMACK G V, CLARKE C L A, BUETTCHER S. Reciprocal rank fusion outperforms condorcet and individual rank learning methods[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009: 758-759.
[19] WU Shengli. Applying statistical principles to data fusion in information retrieval[J]. Expert Systems with Applications, 2009, 36(2):2997-3006.
[20] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]// Proceedings of the 14th International Joint Conference on Artificial Intelligence. San Mateo: Morgan Kaufmann Publishers, 1995: 1137-1143.
[1] 宋元章,李洪雨,陈媛,王俊杰. 基于分形与自适应数据融合的P2P botnet检测方法[J]. 山东大学学报(理学版), 2017, 52(3): 74-81.
[2] 刘烃, 赵宇辰, 刘杨, 孙亚楠. 基于报警数据融合的智能电网攻击检测方法[J]. 山东大学学报(理学版), 2014, 49(09): 35-40.
[3] 邱育锋,汤积华*. 属性内-融合与数据融合挖掘[J]. 山东大学学报(理学版), 2014, 49(06): 11-17.
[4] 陈珂锐,潘君. 基于扩展特征向量空间模型的
多源数据融合
[J]. J4, 2013, 48(11): 87-92.
[5] 万润泽1,雷建军1,袁操2. 基于模糊聚类理论的无线传感器节点休眠优化策略[J]. J4, 2013, 48(09): 17-21.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!