JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (11): 26-32.doi: 10.6040/j.issn.1671-9352.1.2015.E14

Previous Articles     Next Articles

Algorithm of knowledge base cumulative citation recommendation based on semantic features expansion

XU Ye, XU Wei-ran   

  1. School of Information and Communication and Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2015-09-18 Online:2016-11-20 Published:2016-11-22

Abstract: The task of knowledge base cumulative citation recommendation was mainly decomposed into three basic key problems: query expansion based on an entity name in knowledge base, feature extraction for documents and entities.We proposed a method that using the combination of the semantic dictionary(DBpedia)and the word vector(word embedding)for query expansion, and using LDA and ESA algorithms for feature extraction. Finally classify documents based on linear Logistic Regresion combined with unlinear random forest. The F1 value of this system operated on TREC KBA2014 promoted 14.7% compared to the baseline, which indicated that the method raised by the study is good at dealing with question of citation recommendation.

Key words: query expansion, feature extraction, knowledge base, classification

CLC Number: 

  • TP391
[1] ALLAN J. Topic detection and tracking: event-based information organization [M]. Norwell: Kluwer Academic Publishers, 2002:194-218.
[2] 史存会, 林鸿飞. 追踪事件微博报道:一种流的动态话题模型[J]. 山东大学学报(理学版), 2012, 47(5):78-79. SHI Cunhui, LIN Hongfei. Tracking event microblogs: a streaming dynamic topic model[J]. Journal of Shandong University(Natural Science), 2012, 47(5):78-79.
[3] HANANI U, SHAPIRA B, SHOVAL P. Information filtering: overview of issues, research and systems [J]. User Modeling and User-Adapted Interaction, 2001, 11(3):203-259.
[4] BODNER R C, SONG F. Knowledge-based approaches to query expansion in information retrieval[J]. Lecture Notes in Computer Science, 1996, 1081:146-158.
[5] 王瑞琴, 孔繁胜. 基于无导词义消歧的语义查询扩展[J]. 情报学报, 2011, 30(2):131-137. WANG Ruiqin, KONG Fansheng. Semantic query expansion based on unsupervised word sense disambiguation[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(2):131-137.
[6] 杨清琳, 李陶深, 农健. 基于领域本体知识库的语义查询扩展[J]. 计算机工程与设计, 2011, 32(11):3853-3856. YANG Qinglin, LI Taoshen, NONG Jian. Semantic query expansion based on domain ontology knowledge base[J]. Computer Engineering and Design, 2011, 32(11):3853-3856.
[7] 付剑锋, 刘宗田, 刘念祖. 基于多知识库和局部反馈的查询扩展研究[J]. 情报杂志, 2013,32(2):103-106. FU Jianfeng, LIU Zongtian, LIU Nianzu.Research on query expansion based on multi-knowledge base and local feedback[J].Journal of Intelligence, 2013, 32(2):103-106.
[8] 毛琪, 黄永峰. 基于网络知识库与通用搜索引擎的查询词扩展方法[J]. 计算机应用, 2012,32(S2):5-9. MAO Qi, HUANG Yongfeng. Query expansion based on Web knowledge base and search engine[J]. Journal of Computer Applications, 2012, 32(S2):5-9.
[9] 李卫疆, 赵铁军, 王宪刚. 基于上下文的查询扩展[J]. 计算机研究与发展, 2010, 47(2):300-304. LI Weijiang, ZHAO Tiejun, WANG Xiangang. Context-sensitive query expansion[J]. Journal of Computer Research and Development, 2010, 47(2):300-304.
[10] 邹扬. WAF改进算法在基于语义分析的查询扩展上的应用[D]. 北京:北京邮电大学, 2012. ZOU Yang. Topic detection and tracking based on semantic framework [D].Beijing: Beijing University of Posts and Telecommunications, 2012.
[11] 于东, 荀恩东. 基于Word Embedding语义相似度的字母缩略术语消歧[J]. 中文信息学报, 2014, 28(5):51-59. YU Dong, XUN Endong. Acronym term disambiguation based on semantic similarity calculated by word embedding[J].Journal of Chinese Information Processing, 2014, 28(5):51-59.
[12] 石松, 王明文, 涂伟,等. 基于Markov网络团的信息检索扩展模型[J]. 山东大学学报(理学版), 2011(5):54-57. SHI Song, WANG Mingwen, TU Wei, et al. Extended information retrieval model based on the Markov network cliques[J]. Journal of Shandong University(Natural Science), 2011(5):54-57.
[13] WANG J, SONG D, LIN C Y, et al. Bit and MSRA at TREC KBA CCR track 2013[C/OL]. Proceedings of the 22nd Text Retrieval Conference.[2015-03-02]. http://trec.nist.gov/pubs/trec22/papers/BIT-MSRA-kba.pdf.
[14] KJERSTEN B, MCNAMEE P. The HLTCOE approach to the TREC 2012 KBA track[C/OL]. Proceedings of the 22nd Text Retrieval Conference.[2015-03-02]. http://trec.nist.gov/pubs/trec21/papers/hltcoe.kba.final.pdf
[15] BALOG K, RAMAMPIARO H. Cumulative citation recommendation: classification vs. ranking[C] //Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. New York: ACM, 2013:941-944.
[16] GUO J. An activation force-based affinity measure for analyzing complex networks[J]. Scientific Reports, 2011, 1(10):1-9.
[17] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119.
[18] BENGIO Y, SCHWENK H, SENÉCAL J S, et al. A neural probabilistic language model [J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.
[19] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[20] GABRILOVICH E, MARKOVITCH S. Wikipedia-based semantic interpretation for natural language processing [J]. Journal of Artificial Intelligence Research, 2009, 34(4):443-498.
[1] CHEN Yunfan, WANG Yechen, WANG Long, AN Qi, FENG Zeguo. Application of SERS collaborative machine learning in biomedical detection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2025, 60(10): 23-41.
[2] Jie JI,Chengjie SUN,Lili SHAN,Boyue SHANG,Lei LIN. A prompt learning approach for telecom network fraud case classification [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 113-121.
[3] Chao LI,Wei LIAO. Chinese disease text classification model driven by medical knowledge [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 122-130.
[4] Xin WEN,Deyu LI. The ML-KNN method based on attribute weighting [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 107-117.
[5] Jin-yu FAN,Yang ZOU,Jian XIONG,Yongyi GU. Imagedata control chart based on nonnegative CP tensor decomposition [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(1): 27-34.
[6] MENG Jinxu, SHAN Hongtao, HUANG Runcai, YAN Fengting, LI Zhiwei, ZHENG Guangyuan, LIU Yiming, SHI Changtong. Text classification model based on dual-channel feature fusion based on XLNet [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(5): 36-45.
[7] Yu FANG,Huyu ZHENG,Xuemei CAO. Three-way over-sampling method for imbalanced data classification [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 41-51.
[8] SU Zi-peng, YUAN Lei, LIU Peng, CHEN Xing-shu, LUO Yong-gang, CHEN Liang-guo. Research and implementation of real-time processing model of high-speed network stream [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(9): 25-32.
[9] XUE Zhan-ao, LI Yong-xiang, YAO Shou-qian, JING Meng-meng. Data classification method based on Bayesian intuitionistic fuzzy rough sets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(5): 1-10.
[10] ZHENG Cheng-yu, WANG Xin, WANG Ting, DENG Ya-ping, YIN Tian-tian. Multi-label classification for medical text based on ALBERT-TextCNN model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(4): 21-29.
[11] ZHONG Kun-yan, LIU Jing-lei. Image classification based on low-rank inter-class sparsity discriminant least squares regression [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(11): 89-101.
[12] ZHANG Bin-yan, ZHU Xiao-fei, XIAO Zhao-hui, HUANG Xian-ying, WU Jie. Short text classification based on semi-supervised graph neural network [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(5): 57-65.
[13] WANG Xue-yan, HE Ting-ting, HUANG Xiang, WANG Jun-mei, PAN Min. Pseudo-relevance feedback method based on locational relationship in document [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(5): 76-84.
[14] YIN Ai-ying, LIN Jian-zhou, WU Yun-bing, LIAO Xiang-wen. Sentiment classification combining graph convolution neural network [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(11): 15-23.
[15] DONG Yan-ru, LIU Pei-yu, LIU Wen-feng, ZHAO Hong-yan. A text classification model based on BiLSTM and label embedding [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(11): 78-86.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!