JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2017, Vol. 52 ›› Issue (7): 59-65.doi: 10.6040/j.issn.1671-9352.1.2016.PC2

Previous Articles     Next Articles

Semantic graph optimization algorithm based chinesemicroblog opinion summarization

ZHANG Cong, PEI Jia-huan, HUANG Kai-yu, HUANG De-gen*, YIN Zhang-zhi   

  1. School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2016-11-25 Online:2017-07-20 Published:2017-07-07

Abstract: To obtain key information in different topics efficiently, microblog opinion summarization has been a hot spot in natural language processing recently. The baseline method of this paper extracts keywordsusing TF-IDF algorithm, and calculate the importance scores of microblogs to filter out opinion summarization directly; the naive improved methodadded a step of sentiment classification, andremove microblogs which are of low importance and high semantic repetitionusing semantic distance between microblogs to generate opinion summarization;the method based on semantic graph optimization algorithm constructs a complete graph using importance scores and semantic distance of microblogs, and filters out the opinion summarization using graph optimization algorithm. According to the official result of evaluation,on the test dataset of COAE2016, the average ROUGE-1 value, ROUGE-2 value and ROUGE-SU4 value of 10topics using the naive improved methodreached 26.39%, 0.68% and 5.69% respectively, and got 6 max values out of 9 kinds of evaluation index. Besides, the results of experiments done on COAE2016 sample datasetshows that by using the method based on semantic graph optimization algorithmthe ROUGE-1 value, ROUGE-2 value and ROUGE-SU4 value increased by 0.63%, 1.51%, 2.69% respectively.

Key words: microblogssummarization, semantic graph optimization, TF-IDF, sentence similarity

CLC Number: 

  • TP391
[1] 刘德喜, 万常选. 社会化短文本自动摘要研究综述[J]. 小型微型计算机系统, 2013, 34(12): 2764-2771. LIU Dexi, WAN Changxuan. Survey on automatic summarization of scocialized short text[J]. Journal of Chinese Computer Systems, 2013, 34(12): 2764-2771.
[2] VANDERWENDE L, SUZUKI H, BROCKETTC, et al. Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion[J]. Information Processing & Management, 2007, 43(6): 1606-1618.
[3] SINGH M, KHAN FU. Effect of incremental EM on document summarization using probabilistic latent semantic analysis[J]. Lecture Notes in Engineering and Computer Science, 2012, 2198(1):860-863.
[4] GAO D, LI W, OUYANG Y, et al. LDA-based topic formation and topic-sentence reinforcement for graph-based multi-document summarization[C] //Asia Information Retrieval Symposium. Berlin Heidelberg:Springer, 2012: 376-385.
[5] ERKAN G, RADEV D R. LexPageRank: prestige in multi-document text summarization[C].Boston: EMNLP,2004: 365-371.
[6] ZHAO L, WU L, HUANG X. Using query expansion in graph-based approach for query-focused multi-document summarization[J]. Information Processing & Management, 2009, 45(1): 35-41.
[7] WAN X, YANG J, XIAO J. Manifold-ranking based topic-focused multi-document summarization[C]. Hyderabad: IJCAI, 2007: 2903-2908.
[8] OUYANG Y, LI W, LI S, et al. Applying regression models to query-focused multi-document summarization[J]. Information Processing & Management, 2011, 47(2): 227-237.
[9] 刘晓娟, 尤斌, 张爱芸. 基于微博数据的应用研究综述[J]. 情报杂志, 2013, 32(9): 39-45. LIU Xiaojuan, YOU bin, ZHANG Aiyun, Reviewon the data used in researches of microblogs[J]. Journal of Intelligence, 2013, 32(9): 39-45.
[10] INOUYE D, KALITA J K. Comparing twitter summarization algorithms for multiple post summaries[C] // Privacy, Security, Risk and Trust(PASSAT)and 2011 IEEE Third Inernational Conference on Social Computing(SocialCom), 2011 IEEE Third International Conference on. New York:IEEE, 2011: 298-306.
[11] SHARIFI B, HUTTON M A, KALITA J. Summarizing microblogs automatically[C] // Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2010: 685-688.
[12] MENG X, WEI F, LIU X, et al. Entity-centric topic-oriented opinion summarization in twitter[C] //Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2012: 379-387.
[13] 林萌, 罗森林, 贾丛飞, 等. 融合句义结构模型的微博话题摘要算法[J]. 浙江大学学报(工学版), 2015(12): 2316-2325. LIN Meng, LUO Senlin, JIA Congfei, et al. MicroBlog Topics Summarization algorithm merging sentential structure model[J]. Journal of Zhejiang University(Engineering Science), 2015(12): 2316-2325.
[14] LIN C Y. Rouge: a package for automatic evaluation of summaries[C] // Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. Stroudsburg: ACL, 2004, 8: 74-82.
[1] ZHANG Xin-meng, JIANG Sheng-yi, ZHANG Qian-sheng, XIE Bo-lin, LI Xia. Hybrid recommendation by combining network-based algorithm and user preference [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(09): 29-35.
[2] WANG Shao-peng, PENG Yan, WANG Jie. Research of the text clustering based on LDA using in network public opinion analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 129-134.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!