JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2015, Vol. 50 ›› Issue (01): 31-36.doi: 10.6040/j.issn.1671-9352.3.2014.033

Previous Articles     Next Articles

Search result diversification via data fusion

HUANG Chun-lan, WU Sheng-li   

  1. School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang 212013, Jiangsu, China
  • Received:2014-09-05 Revised:2014-11-25 Online:2015-01-20 Published:2015-01-24

Abstract: Information retrieval systems need to consider both aspects of relevance and diversity for those retrieved documents. To solve the problem of search result diversification, a different perspective was adopted to solve the problem based on a discussion of the application of data fusion method in the search result diversification. Especially for the linear combination method, the weight allocation strategy for component systems was reexamined. Both the effectiveness of component retrieval systems and the dissimilarity of them were concerned, and a simple and convenient method for calculating the dissimilarity was put forward, based on set covering rate. Thereby a linear combination method with such weighting assignment can improve the performance of results in the diversity. Experiments were carried out with 3 groups of top-ranked results submitted to the TREC web diversity task. The result of experiments shows that data fusion is still a useful approach to performance improvement for diversity as for relevance previously.

Key words: weight assignment, search result diversification, data fusion, linear combination

CLC Number: 

  • TP391
[1] AGRAWAL R, GOLLAPUDI S, HALVERSON A, et al. Diversifying search results[C]// Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. New York: ACM, 2009: 5-14.
[2] CARBONELL J, GOLDSTEIN J. The use of MMR, diversity-based re-ranking for reordering documents and producing summaries[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1998: 335-336.
[3] WANG Jun, ZHU Jianhan. Portfolio theory of information retrieval[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009: 115-122.
[4] ZHAI Chengxiang, COHEN William W, LAFFERTY J. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003: 10-17.
[5] DANG Van, CROFT W B. Diversity by proportionality: an election-based approach to search result diversification[C]// Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2012: 65-74.
[6] SANTOS R L T, MACDONALD C, OUNIS I. Exploiting query reformulations for web search result diversification[C]// Proceedings of the 19th International Conference on World Wide Web. New York: ACM, 2010: 881-890.
[7] DANG V, CROFT B W. Term level search result diversification[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 603-612.
[8] AKTOLGA E, ALLAN J. Sentiment diversification with different biases[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 593-602.
[9] NGUYEN T N, KANHABUA N. Leveraging dynamic query subtopics for time-aware search result diversification[M]// Advances in Information Retrieval. New York: Springer International Publishing, 2014: 222-234.
[10] YIN Xiaoshi, HUANG J X, LI Zhoujun, et al. A survival modeling approach to biomedical search result diversification using Wikipedia[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(6):1201-1212.
[11] SAKAI T, DOU Zhicheng, YAMAMOTO T, et al. Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification[C]// Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013: 761-764.
[12] ZHENG Wei, FANG Hui, YAO Conglei, et al. Leveraging integrated information to extract query subtopics for search result diversification[J]. Information Retrieval, 2014, 17(1):52-73.
[13] ZHENG Wei, FANG Hui. A diagnostic study of search result diversification methods[C]// Proceedings of the 2013 Conference on the Theory of Information Retrieval. New York: ACM, 2013: 17.
[14] LEE J H. Analyses of multiple evidence combination[C]// Proceedings of the 20th Annual International ACM SIGIR Conference. New York: ACM, 1997, 31(SI):267-276.
[15] WU Shengli, BI Yaxin, ZENG Xiaoqin. The linear combination data fusion method in information retrieval[J]. Lecture Notes in Computer Science, 2011, 6861:219-233.
[16] WU Shengli, MCCLEAN S. Performance prediction of data fusion for information retrieval[J]. Information Processing & Management, 2006, 42(4):899-915.
[17] CLARKE C L A, CRASWELL N, SOBOROFF I, et al. Overview of the TREC 2011 web track[C]// Proceedings of TREC Conference. Gaithersburg:[s.n.]. 2011: 1-9.
[18] CORMACK G V, CLARKE C L A, BUETTCHER S. Reciprocal rank fusion outperforms condorcet and individual rank learning methods[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009: 758-759.
[19] WU Shengli. Applying statistical principles to data fusion in information retrieval[J]. Expert Systems with Applications, 2009, 36(2):2997-3006.
[20] KOHAVI R. A study of cross-validation and bootstrap for accuracy estimation and model selection[C]// Proceedings of the 14th International Joint Conference on Artificial Intelligence. San Mateo: Morgan Kaufmann Publishers, 1995: 1137-1143.
[1] SONG Yuan-zhang, LI Hong-yu, CHEN Yuan, WANG Jun-jie. P2P botnet detection method based on fractal and adaptive data fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 74-81.
[2] QIU Yu-feng, TANG Ji-hua*. Attribute inner fusion and data fusion mining#br# [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(06): 11-17.
[3] CHEN Ke-rui, PAN Jun. Multi-source data fusion based on the expand vector space model [J]. J4, 2013, 48(11): 87-92.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!