您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2016, Vol. 51 ›› Issue (11): 26-32.doi: 10.6040/j.issn.1671-9352.1.2015.E14

• • 上一篇    下一篇

基于语义特征扩展的知识库增量引文推荐算法

徐也,徐蔚然   

  1. 北京邮电大学信息与通信工程学院, 北京 100876
  • 收稿日期:2015-09-18 出版日期:2016-11-20 发布日期:2016-11-22
  • 作者简介:徐也(1990— ),男,硕士研究生,主要研究方向为信息抽取.E-mail: bob.ye.xu@gmail.com

Algorithm of knowledge base cumulative citation recommendation based on semantic features expansion

XU Ye, XU Wei-ran   

  1. School of Information and Communication and Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2015-09-18 Online:2016-11-20 Published:2016-11-22

摘要: 将知识库增量引文推荐(cumulative citation recommendation, CCR)任务分解为3个基本的关键问题:针对知识库某一实体名的查询扩展;针对文档和实体的特征提取;基于线性和非线性相结合的分类模型。提出了基于语义词典(DBpedia)与词向量(word embedding)相结合的方法进行查询扩展,以及利用LDA和ESA两种算法对文档进行特征提取,最终通过线性逻辑回归与非线性随机森林相融合的分类算法实现CCR算法。与基线系统相比,该方法在TREC KBA2014评测数据上的试验结果的F1平均提升了14.7%,表明本文设计的方法能够较好地解决引文推荐问题。

关键词: 查询扩展, 分类, 知识库, 特征提取

Abstract: The task of knowledge base cumulative citation recommendation was mainly decomposed into three basic key problems: query expansion based on an entity name in knowledge base, feature extraction for documents and entities.We proposed a method that using the combination of the semantic dictionary(DBpedia)and the word vector(word embedding)for query expansion, and using LDA and ESA algorithms for feature extraction. Finally classify documents based on linear Logistic Regresion combined with unlinear random forest. The F1 value of this system operated on TREC KBA2014 promoted 14.7% compared to the baseline, which indicated that the method raised by the study is good at dealing with question of citation recommendation.

Key words: query expansion, feature extraction, knowledge base, classification

中图分类号: 

  • TP391
[1] ALLAN J. Topic detection and tracking: event-based information organization [M]. Norwell: Kluwer Academic Publishers, 2002:194-218.
[2] 史存会, 林鸿飞. 追踪事件微博报道:一种流的动态话题模型[J]. 山东大学学报(理学版), 2012, 47(5):78-79. SHI Cunhui, LIN Hongfei. Tracking event microblogs: a streaming dynamic topic model[J]. Journal of Shandong University(Natural Science), 2012, 47(5):78-79.
[3] HANANI U, SHAPIRA B, SHOVAL P. Information filtering: overview of issues, research and systems [J]. User Modeling and User-Adapted Interaction, 2001, 11(3):203-259.
[4] BODNER R C, SONG F. Knowledge-based approaches to query expansion in information retrieval[J]. Lecture Notes in Computer Science, 1996, 1081:146-158.
[5] 王瑞琴, 孔繁胜. 基于无导词义消歧的语义查询扩展[J]. 情报学报, 2011, 30(2):131-137. WANG Ruiqin, KONG Fansheng. Semantic query expansion based on unsupervised word sense disambiguation[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(2):131-137.
[6] 杨清琳, 李陶深, 农健. 基于领域本体知识库的语义查询扩展[J]. 计算机工程与设计, 2011, 32(11):3853-3856. YANG Qinglin, LI Taoshen, NONG Jian. Semantic query expansion based on domain ontology knowledge base[J]. Computer Engineering and Design, 2011, 32(11):3853-3856.
[7] 付剑锋, 刘宗田, 刘念祖. 基于多知识库和局部反馈的查询扩展研究[J]. 情报杂志, 2013,32(2):103-106. FU Jianfeng, LIU Zongtian, LIU Nianzu.Research on query expansion based on multi-knowledge base and local feedback[J].Journal of Intelligence, 2013, 32(2):103-106.
[8] 毛琪, 黄永峰. 基于网络知识库与通用搜索引擎的查询词扩展方法[J]. 计算机应用, 2012,32(S2):5-9. MAO Qi, HUANG Yongfeng. Query expansion based on Web knowledge base and search engine[J]. Journal of Computer Applications, 2012, 32(S2):5-9.
[9] 李卫疆, 赵铁军, 王宪刚. 基于上下文的查询扩展[J]. 计算机研究与发展, 2010, 47(2):300-304. LI Weijiang, ZHAO Tiejun, WANG Xiangang. Context-sensitive query expansion[J]. Journal of Computer Research and Development, 2010, 47(2):300-304.
[10] 邹扬. WAF改进算法在基于语义分析的查询扩展上的应用[D]. 北京:北京邮电大学, 2012. ZOU Yang. Topic detection and tracking based on semantic framework [D].Beijing: Beijing University of Posts and Telecommunications, 2012.
[11] 于东, 荀恩东. 基于Word Embedding语义相似度的字母缩略术语消歧[J]. 中文信息学报, 2014, 28(5):51-59. YU Dong, XUN Endong. Acronym term disambiguation based on semantic similarity calculated by word embedding[J].Journal of Chinese Information Processing, 2014, 28(5):51-59.
[12] 石松, 王明文, 涂伟,等. 基于Markov网络团的信息检索扩展模型[J]. 山东大学学报(理学版), 2011(5):54-57. SHI Song, WANG Mingwen, TU Wei, et al. Extended information retrieval model based on the Markov network cliques[J]. Journal of Shandong University(Natural Science), 2011(5):54-57.
[13] WANG J, SONG D, LIN C Y, et al. Bit and MSRA at TREC KBA CCR track 2013[C/OL]. Proceedings of the 22nd Text Retrieval Conference.[2015-03-02]. http://trec.nist.gov/pubs/trec22/papers/BIT-MSRA-kba.pdf.
[14] KJERSTEN B, MCNAMEE P. The HLTCOE approach to the TREC 2012 KBA track[C/OL]. Proceedings of the 22nd Text Retrieval Conference.[2015-03-02]. http://trec.nist.gov/pubs/trec21/papers/hltcoe.kba.final.pdf
[15] BALOG K, RAMAMPIARO H. Cumulative citation recommendation: classification vs. ranking[C] //Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. New York: ACM, 2013:941-944.
[16] GUO J. An activation force-based affinity measure for analyzing complex networks[J]. Scientific Reports, 2011, 1(10):1-9.
[17] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119.
[18] BENGIO Y, SCHWENK H, SENÉCAL J S, et al. A neural probabilistic language model [J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.
[19] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[20] GABRILOVICH E, MARKOVITCH S. Wikipedia-based semantic interpretation for natural language processing [J]. Journal of Artificial Intelligence Research, 2009, 34(4):443-498.
[1] 何怡,邵亚斌,冯慧,郭瑞莲. 基于快速超粒方生成算法的分类器模型[J]. 《山东大学学报(理学版)》, 2026, 61(5): 65-78.
[2] 陈云帆,王也晨,王龙,安琪,冯泽国. SERS协同机器学习在生物医药检测中的应用[J]. 《山东大学学报(理学版)》, 2025, 60(10): 23-41.
[3] 纪杰,孙承杰,单丽莉,尚伯乐,林磊. 基于提示学习的电信网络诈骗案件分类方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 113-121.
[4] 黎超,廖薇. 基于医疗知识驱动的中文疾病文本分类模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 122-130.
[5] 温欣,李德玉. 基于属性加权的ML-KNN方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 107-117.
[6] 范金宇,邹杨,熊健,古勇毅. 基于非负CP分解的图像数据监控方法[J]. 《山东大学学报(理学版)》, 2024, 59(1): 27-34.
[7] 孟金旭,单鸿涛,黄润才,闫丰亭,李志伟,郑光远,刘一鸣,石昌通. 基于XLNet的双通道特征融合文本分类模型[J]. 《山东大学学报(理学版)》, 2023, 58(5): 36-45.
[8] 方宇,郑胡宇,曹雪梅. 三支过采样的不平衡数据分类方法[J]. 《山东大学学报(理学版)》, 2023, 58(12): 41-51.
[9] 苏自鹏,袁磊,刘鹏,陈兴蜀,罗永刚,陈良国. 高速网络流实时处理模型研究与实现[J]. 《山东大学学报(理学版)》, 2022, 57(9): 25-32.
[10] 薛占熬,李永祥,姚守倩,荆萌萌. 基于Bayesian直觉模糊粗糙集的数据分类方法[J]. 《山东大学学报(理学版)》, 2022, 57(5): 1-10.
[11] 郑承宇,王新,王婷,邓亚萍,尹甜甜. 基于ALBERT-TextCNN模型的多标签医疗文本分类方法[J]. 《山东大学学报(理学版)》, 2022, 57(4): 21-29.
[12] 钟堃琰,刘惊雷. 基于低秩类间稀疏判别最小二乘回归的图像分类[J]. 《山东大学学报(理学版)》, 2022, 57(11): 89-101.
[13] 张斌艳,朱小飞,肖朝晖,黄贤英,吴洁. 基于半监督图神经网络的短文本分类[J]. 《山东大学学报(理学版)》, 2021, 56(5): 57-65.
[14] 王雪彦,何婷婷,黄翔,王俊美,潘敏. 基于文档内位置关系的伪相关反馈方法[J]. 《山东大学学报(理学版)》, 2021, 56(5): 76-84.
[15] 阴爱英,林建洲,吴运兵,廖祥文. 融合图卷积神经网络的文本情感分类[J]. 《山东大学学报(理学版)》, 2021, 56(11): 15-23.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!