您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2016, Vol. 51 ›› Issue (11): 26-32.doi: 10.6040/j.issn.1671-9352.1.2015.E14

• • 上一篇    下一篇

基于语义特征扩展的知识库增量引文推荐算法

徐也,徐蔚然   

  1. 北京邮电大学信息与通信工程学院, 北京 100876
  • 收稿日期:2015-09-18 出版日期:2016-11-20 发布日期:2016-11-22
  • 作者简介:徐也(1990— ),男,硕士研究生,主要研究方向为信息抽取.E-mail: bob.ye.xu@gmail.com

Algorithm of knowledge base cumulative citation recommendation based on semantic features expansion

XU Ye, XU Wei-ran   

  1. School of Information and Communication and Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2015-09-18 Online:2016-11-20 Published:2016-11-22

摘要: 将知识库增量引文推荐(cumulative citation recommendation, CCR)任务分解为3个基本的关键问题:针对知识库某一实体名的查询扩展;针对文档和实体的特征提取;基于线性和非线性相结合的分类模型。提出了基于语义词典(DBpedia)与词向量(word embedding)相结合的方法进行查询扩展,以及利用LDA和ESA两种算法对文档进行特征提取,最终通过线性逻辑回归与非线性随机森林相融合的分类算法实现CCR算法。与基线系统相比,该方法在TREC KBA2014评测数据上的试验结果的F1平均提升了14.7%,表明本文设计的方法能够较好地解决引文推荐问题。

关键词: 查询扩展, 分类, 知识库, 特征提取

Abstract: The task of knowledge base cumulative citation recommendation was mainly decomposed into three basic key problems: query expansion based on an entity name in knowledge base, feature extraction for documents and entities.We proposed a method that using the combination of the semantic dictionary(DBpedia)and the word vector(word embedding)for query expansion, and using LDA and ESA algorithms for feature extraction. Finally classify documents based on linear Logistic Regresion combined with unlinear random forest. The F1 value of this system operated on TREC KBA2014 promoted 14.7% compared to the baseline, which indicated that the method raised by the study is good at dealing with question of citation recommendation.

Key words: query expansion, feature extraction, knowledge base, classification

中图分类号: 

  • TP391
[1] ALLAN J. Topic detection and tracking: event-based information organization [M]. Norwell: Kluwer Academic Publishers, 2002:194-218.
[2] 史存会, 林鸿飞. 追踪事件微博报道:一种流的动态话题模型[J]. 山东大学学报(理学版), 2012, 47(5):78-79. SHI Cunhui, LIN Hongfei. Tracking event microblogs: a streaming dynamic topic model[J]. Journal of Shandong University(Natural Science), 2012, 47(5):78-79.
[3] HANANI U, SHAPIRA B, SHOVAL P. Information filtering: overview of issues, research and systems [J]. User Modeling and User-Adapted Interaction, 2001, 11(3):203-259.
[4] BODNER R C, SONG F. Knowledge-based approaches to query expansion in information retrieval[J]. Lecture Notes in Computer Science, 1996, 1081:146-158.
[5] 王瑞琴, 孔繁胜. 基于无导词义消歧的语义查询扩展[J]. 情报学报, 2011, 30(2):131-137. WANG Ruiqin, KONG Fansheng. Semantic query expansion based on unsupervised word sense disambiguation[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(2):131-137.
[6] 杨清琳, 李陶深, 农健. 基于领域本体知识库的语义查询扩展[J]. 计算机工程与设计, 2011, 32(11):3853-3856. YANG Qinglin, LI Taoshen, NONG Jian. Semantic query expansion based on domain ontology knowledge base[J]. Computer Engineering and Design, 2011, 32(11):3853-3856.
[7] 付剑锋, 刘宗田, 刘念祖. 基于多知识库和局部反馈的查询扩展研究[J]. 情报杂志, 2013,32(2):103-106. FU Jianfeng, LIU Zongtian, LIU Nianzu.Research on query expansion based on multi-knowledge base and local feedback[J].Journal of Intelligence, 2013, 32(2):103-106.
[8] 毛琪, 黄永峰. 基于网络知识库与通用搜索引擎的查询词扩展方法[J]. 计算机应用, 2012,32(S2):5-9. MAO Qi, HUANG Yongfeng. Query expansion based on Web knowledge base and search engine[J]. Journal of Computer Applications, 2012, 32(S2):5-9.
[9] 李卫疆, 赵铁军, 王宪刚. 基于上下文的查询扩展[J]. 计算机研究与发展, 2010, 47(2):300-304. LI Weijiang, ZHAO Tiejun, WANG Xiangang. Context-sensitive query expansion[J]. Journal of Computer Research and Development, 2010, 47(2):300-304.
[10] 邹扬. WAF改进算法在基于语义分析的查询扩展上的应用[D]. 北京:北京邮电大学, 2012. ZOU Yang. Topic detection and tracking based on semantic framework [D].Beijing: Beijing University of Posts and Telecommunications, 2012.
[11] 于东, 荀恩东. 基于Word Embedding语义相似度的字母缩略术语消歧[J]. 中文信息学报, 2014, 28(5):51-59. YU Dong, XUN Endong. Acronym term disambiguation based on semantic similarity calculated by word embedding[J].Journal of Chinese Information Processing, 2014, 28(5):51-59.
[12] 石松, 王明文, 涂伟,等. 基于Markov网络团的信息检索扩展模型[J]. 山东大学学报(理学版), 2011(5):54-57. SHI Song, WANG Mingwen, TU Wei, et al. Extended information retrieval model based on the Markov network cliques[J]. Journal of Shandong University(Natural Science), 2011(5):54-57.
[13] WANG J, SONG D, LIN C Y, et al. Bit and MSRA at TREC KBA CCR track 2013[C/OL]. Proceedings of the 22nd Text Retrieval Conference.[2015-03-02]. http://trec.nist.gov/pubs/trec22/papers/BIT-MSRA-kba.pdf.
[14] KJERSTEN B, MCNAMEE P. The HLTCOE approach to the TREC 2012 KBA track[C/OL]. Proceedings of the 22nd Text Retrieval Conference.[2015-03-02]. http://trec.nist.gov/pubs/trec21/papers/hltcoe.kba.final.pdf
[15] BALOG K, RAMAMPIARO H. Cumulative citation recommendation: classification vs. ranking[C] //Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. New York: ACM, 2013:941-944.
[16] GUO J. An activation force-based affinity measure for analyzing complex networks[J]. Scientific Reports, 2011, 1(10):1-9.
[17] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119.
[18] BENGIO Y, SCHWENK H, SENÉCAL J S, et al. A neural probabilistic language model [J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.
[19] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[20] GABRILOVICH E, MARKOVITCH S. Wikipedia-based semantic interpretation for natural language processing [J]. Journal of Artificial Intelligence Research, 2009, 34(4):443-498.
[1] 严倩,王礼敏,李寿山,周国栋. 结合新闻和评论文本的读者情绪分类方法[J]. 山东大学学报(理学版), 2018, 53(9): 35-39.
[2] 左芝翠,张贤勇,莫智文,冯林. 基于决策分类的分块差别矩阵及其求核算法[J]. 山东大学学报(理学版), 2018, 53(8): 25-33.
[3] 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45.
[4] 李会会,刘希强,辛祥鹏. 变系数Benjamin-Bona-Mahony-Burgers方程的微分不变量和精确解[J]. 山东大学学报(理学版), 2018, 53(10): 51-60.
[5] 杨艳,徐冰,杨沐昀,赵晶晶. 一种基于联合深度学习模型的情感分类方法[J]. 山东大学学报(理学版), 2017, 52(9): 19-25.
[6] 杜漫,徐学可,杜慧,伍大勇,刘悦,程学旗. 面向情绪分类的情绪词向量学习[J]. 山东大学学报(理学版), 2017, 52(7): 52-58.
[7] 唐明伟,苏新宁,蒋勋. RESTful Web服务和知识库协同驱动的突发事件网络舆情实时追踪[J]. 山东大学学报(理学版), 2017, 52(6): 49-55.
[8] 乔虎生,白永发. S-系对幺半群的刻画[J]. 山东大学学报(理学版), 2017, 52(2): 1-4.
[9] 罗永贵. 半群W(n,r)的极大(正则)子半群[J]. 山东大学学报(理学版), 2017, 52(10): 7-11.
[10] 唐亮,赵晓峰,席耀一,易绵竹. 融合局部共现和上下文相似度的查询扩展方法[J]. 山东大学学报(理学版), 2017, 52(1): 29-36.
[11] 管毅舟,徐博,林原,林鸿飞. 基于社会化标注和网页分类的个性化检索方法[J]. 山东大学学报(理学版), 2016, 51(7): 35-42.
[12] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[13] 马飞翔,廖祥文,於志勇,吴运兵,陈国龙. 基于知识图谱的文本观点检索方法[J]. 山东大学学报(理学版), 2016, 51(11): 33-40.
[14] 马丽菲,莫倩,杜辉. 面向中文短影评的分类技术研究[J]. 山东大学学报(理学版), 2016, 51(1): 52-57.
[15] 陈松良. 具有非交换Sylow子群的p2q3阶群的构造[J]. 山东大学学报(理学版), 2015, 50(12): 93-97.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!