您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2021, Vol. 56 ›› Issue (5): 76-84.doi: 10.6040/j.issn.1671-9352.1.2020.019

• • 上一篇    

基于文档内位置关系的伪相关反馈方法

王雪彦1,2,3,何婷婷1,2,3*,黄翔4,王俊美5,潘敏6   

  1. 1.人工智能与智慧学习湖北省重点实验室, 湖北 武汉 430070;2.华中师范大学计算机学院, 湖北 武汉 430070;3.国家语言资源监测与研究网络媒体中心, 湖北 武汉 430070;4.华中师范大学国家数字化学习工程研究技术中心, 湖北 武汉 430070;5.华中师范大学数学与统计学学院, 湖北 武汉 430070;6.湖北师范大学计算机与信息工程学院, 湖北 黄石 435000
  • 发布日期:2021-05-13
  • 作者简介:王雪彦(1995— ),女,硕士研究生,研究方向为信息检索. E-mail:xueyanwang@mails.ccnu.edu.cn*通信作者简介:何婷婷(1964— ),女,博士,教授,研究方向为自然语言处理、信息检索和网络媒体监测. E-mail:tthe@mail.ccnu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61532008,61932008);武汉市科技计划项目(2019010701011392);国家语委科研中心项目(ZDI135-135);湖北省重点研发计划项目(2020BAB017)

Pseudo-relevance feedback method based on locational relationship in document

WANG Xue-yan1,2,3, HE Ting-ting1,2,3*, HUANG Xiang4, WANG Jun-mei5, PAN Min6   

  1. 1. Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Wuhan 430070, Hubei, China;
    2. School of Computer, Central China Normal University, Wuhan 430070, Hubei, China;
    3. National Language Resources Monitor &
    Research Center for Network Media, Wuhan 430070, Hubei, China;
    4. National Engineering Research Center for E-Learning, Central China Normal University, Wuhan 430070, Hubei, China;
    5. School of Mathematics and Statistics, Central China Normal University, Wuhan 430070, Hubei, China;
    6. School of Computer and Information Engineering, Hubei Normal University, Huangshi 435000, Hubei, China
  • Published:2021-05-13

摘要: 提出了一种基于文档内位置关系的伪相关反馈框架LRoc(location-based Rocchio framework)。该框架采用不同的核函数对候选词项在反馈文档中的位置进行建模,得到候选扩展词的位置重要度,并将其应用到经典的Rocchio模型中。该方法在选择和评估候选扩展词时,不仅考虑了词频,也考虑了词项位置的影响,有助于获取与查询更可能相关的扩展词。最后,在5种TREC数据集的实验结果表明:基于LRoc框架提出的3种模型(LRoc1、LRoc2和LRoc3)对比基线模型在MAP和P@20指标上具有显著提升。

关键词: 伪相关反馈, 位置关系, 查询扩展

Abstract: This paper proposes a location-based Rocchio framework(LRoc), with three variants. The method uses different kernel functions to model the term location in the feedback documents, obtains the importance information from the locations of candidate expansion terms, and integrates it into the classic Rocchio model. When selecting and evaluating the candidate expansion terms, this method not only considers term frequency, but also considers the influence of term location, which helps to obtain the expansion terms that are more likely to be relevant to the query. Finally, a series of experiments are performed on five standard text REtrieval conference(TREC)datasets. The proposed three models(LRoc1, LRoc2 and LRoc3)based on the LRoc framework all have got significant improvements over the baseline model in terms of the mean average precision(MAP)and precision at position 20(P@20)indicators.

Key words: pseudo-relevance feedback, locational relationship, query expansion

中图分类号: 

  • TP391
[1] WANG Junmei, PAN Min, HE Tingting, et al. A pseudo-relevance feedback framework combining relevance matching and semantic matching for information retrieval[J]. Information Processing & Management, 2020, 57(6):102342.
[2] PAN Min, HUANG Jimmy Xiangji, HE Tingting, et al. A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback[J]. Journal of the Association for Information Science and Technology, 2020, 71(3):264-281.
[3] PAN Min, ZHANG Yue, ZHU Qiang, et al. An adaptive term proximity based Rocchios model for clinical decision support retrieval[J]. BMC Medical Informatics and Decision Making, 2019, 19(Suppl9):251. doi: 10.1186/s12911-019-0986-6.
[4] SCOLLON R. Eight legs and one elbow: stance and structure in Chinese English compositions[C] //International Reading Association, Second North American Conference on Adult and Adolescent Literacy. Banff:[s.n.] , 1991: 21.
[5] 蔡基刚. 英汉文章中心思想表达位置差异及其对中国学生英语写作影响[J]. 国外外语教学, 2007(1):1-7. CAI Jigang. The difference in the expression of central ideas in English and Chinese articles and its influence on Chinese students English writing[J]. Foreign Language Learning, 2007(1):1-7.
[6] TYNE J L. Fundamentals of good writing: a handbook of modern rhetoric[J]. Thought: Fordham University Quarterly, 1952, 27(3):462-464.
[7] LV Y, ZHAI C X. Positional relevance model for pseudo-relevance feedback[C] //Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2010: 579-586.
[8] ZHAO J, HUANG J X, HE B. CRTER: using cross terms to enhance probabilistic information retrieval[C] //Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 155-164.
[9] MIAO J, HUANG J X, YE Z. Proximity-based rocchios model for pseudo relevance[C] //Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2012: 535-544.
[10] ZHAO J, HUANG J X, WU S. Rewarding term location information to enhance probabilistic information retrieval[C] //Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2012: 1137-1138.
[11] ROCCHIO J J. Relevance feedback in information retrieval[M] //The Smart Retrieval System-experiments in Automatic Document Processing. Englewood Cliffs: Prentice-Hall, 1971: 313-323.
[12] CHEN Q, HU Q, HUANG J X, et al. Enhancing recurrent neural networks with positional attention for question answering[C] //Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2017: 993-996.
[13] SONG R, YU L, WEN J R, et al. A proximity probabilistic model for information retrieval[J]. Microsoft Research, 2011. https://www.researchgate.net/publication/228731320_A_Proximity_Probabilistic_Model_for_Information_Retrieval.
[14] GIACHANOU A, CRESTANI F. Opinion retrieval in Twitter: is proximity effective[C] //Proceedings of the 31st Annual ACM Symposium on Applied Computing. New York: ACM, 2016: 1146-1151.
[15] EHSAN N, SHAKERY A. Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information[J]. Information Processing & Management, 2016, 52(6):1004-1017.
[16] LV Y, ZHAI C X. Positional language models for information retrieval[C] //Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009: 299-306.
[17] YE Z, HUANG J X. A simple term frequency transformation model for effective pseudo relevance feedback[C] //Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2014: 323-332.
[1] 唐亮,赵晓峰,席耀一,易绵竹. 融合局部共现和上下文相似度的查询扩展方法[J]. 山东大学学报(理学版), 2017, 52(1): 29-36.
[2] 孟烨,张鹏,宋大为. 探索数据集特征与伪相关反馈的平衡参数之间的关系[J]. 山东大学学报(理学版), 2016, 51(7): 18-22.
[3] 徐也,徐蔚然. 基于语义特征扩展的知识库增量引文推荐算法[J]. 山东大学学报(理学版), 2016, 51(11): 26-32.
[4] 马飞翔,廖祥文,於志勇,吴运兵,陈国龙. 基于知识图谱的文本观点检索方法[J]. 山东大学学报(理学版), 2016, 51(11): 33-40.
[5] 石松1,王明文1,涂伟2,何世柱1. 基于Markov网络团的信息检索扩展模型[J]. J4, 2011, 46(5): 54-57.
[6] 徐建民1,3,陈振亚2,崔琰3. 基于用户兴趣及术语间关系的查询扩展方法[J]. J4, 2011, 46(5): 49-53.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!