山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (01): 31-36.doi: 10.6040/j.issn.1671-9352.3.2014.033
黄春兰, 吴胜利
HUANG Chun-lan, WU Sheng-li
摘要: 信息检索系统不仅需要考虑文档的相关性,还要考虑文档的多样性和新颖性。针对信息检索结果的多元化问题,探讨了数据融合方法在搜索结果多元化上的适用性。针对线性组合方法,重新考察了成员系统的权重分配策略。通过考虑成员检索系统的有效性和成员检索系统之间的差异性,提出了一种比较简单方便的基于集合覆盖率的方法,使得采用这种权重分配方式的线性组合方法在结果的多样性上能够有所改善。实验采用了3组来自于TREC文本检索会议的针对Web检索多样化任务的数据,实验结果表明在多样性方面,所提出的数据融合方法均能提高检索结果的性能,优于最佳的成员检索系统。
中图分类号:
[1] AGRAWAL R, GOLLAPUDI S, HALVERSON A, et al. Diversifying search results[C]// Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. New York: ACM, 2009: 5-14. [2] CARBONELL J, GOLDSTEIN J. The use of MMR, diversity-based re-ranking for reordering documents and producing summaries[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1998: 335-336. [3] WANG Jun, ZHU Jianhan. Portfolio theory of information retrieval[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009: 115-122. [4] ZHAI Chengxiang, COHEN William W, LAFFERTY J. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003: 10-17. [5] DANG Van, CROFT W B. Diversity by proportionality: an election-based approach to search result diversification[C]// Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2012: 65-74. [6] SANTOS R L T, MACDONALD C, OUNIS I. Exploiting query reformulations for web search result diversification[C]// Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010: 881-890. [7] DANG V, CROFT B W. Term level search result diversification[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 603-612. [8] AKTOLGA E, ALLAN J. Sentiment diversification with different biases[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 593-602. [9] NGUYEN T N, KANHABUA N. Leveraging dynamic query subtopics for time-aware search result diversification[M]// Advances in Information Retrieval. New York: Springer International Publishing, 2014: 222-234. [10] YIN Xiaoshi, HUANG J X, LI Zhoujun, et al. A survival modeling approach to biomedical search result diversification using Wikipedia[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(6):1201-1212. [11] SAKAI T, DOU Zhicheng, YAMAMOTO T, et al. Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 761-764. [12] ZHENG Wei, FANG Hui, YAO Conglei, et al. Leveraging integrated information to extract query subtopics for search result diversification[J]. Information Retrieval, 2014, 17(1):52-73. [13] ZHENG Wei, FANG Hui. A diagnostic study of search result diversification methods[C]// Proceedings of the 2013 Conference on the Theory of Information Retrieval. New York: ACM, 2013: 17. [14] LEE J H. Analyses of multiple evidence combination[C]// Proceedings of the 20th Annual International ACM SIGIR Conference. New York: ACM, 1997, 31(SI):267-276. [15] WU Shengli, BI Yaxin, ZENG Xiaoqin. The linear combination data fusion method in information retrieval[J]. Lecture Notes in Computer Science, 2011, 6861:219-233. [16] WU Shengli, MCCLEAN S. Performance prediction of data fusion for information retrieval[J]. Information Processing & Management, 2006, 42(4):899-915. [17] CLARKE C L A, CRASWELL N, SOBOROFF I, et al. Overview of the TREC 2011 web track[C]// Proceedings of TREC Conference. Gaithersburg:[s.n.]. 2011: 1-9. [18] CORMACK G V, CLARKE C L A, BUETTCHER S. Reciprocal rank fusion outperforms condorcet and individual rank learning methods[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009: 758-759. [19] WU Shengli. Applying statistical principles to data fusion in information retrieval[J]. Expert Systems with Applications, 2009, 36(2):2997-3006. [20] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]// Proceedings of the 14th International Joint Conference on Artificial Intelligence. San Mateo: Morgan Kaufmann Publishers, 1995: 1137-1143. |
[1] | 宋元章,李洪雨,陈媛,王俊杰. 基于分形与自适应数据融合的P2P botnet检测方法[J]. 山东大学学报(理学版), 2017, 52(3): 74-81. |
[2] | 刘烃, 赵宇辰, 刘杨, 孙亚楠. 基于报警数据融合的智能电网攻击检测方法[J]. 山东大学学报(理学版), 2014, 49(09): 35-40. |
[3] | 邱育锋,汤积华*. 属性内-融合与数据融合挖掘[J]. 山东大学学报(理学版), 2014, 49(06): 11-17. |
[4] | 陈珂锐,潘君. 基于扩展特征向量空间模型的 多源数据融合[J]. J4, 2013, 48(11): 87-92. |
[5] | 万润泽1,雷建军1,袁操2. 基于模糊聚类理论的无线传感器节点休眠优化策略[J]. J4, 2013, 48(09): 17-21. |
|