山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (7): 59-65.doi: 10.6040/j.issn.1671-9352.1.2016.PC2
张聪,裴家欢,黄锴宇,黄德根*,殷章志
ZHANG Cong, PEI Jia-huan, HUANG Kai-yu, HUANG De-gen*, YIN Zhang-zhi
摘要: 为从海量微博中高效地获取不同话题下的关键信息,微博观点摘要成为自然语言处理领域近期研究的热点之一。基线方法基于TF-IDF算法抽取微博句中的关键词,并据此计算微博的重要性分数,直接筛选出观点摘要;朴素改进方法在基线方法的基础上,增加了情感分类步骤,并利用微博句之间的语义距离,将摘要句候选集中语义重复、重要度较小的句子去除,生成观点摘要;基于语义图优化算法的方法在朴素改进方法的基础上,利用微博句的重要性分数及微博句之间的语义距离构建语义图结构,并通过图优化算法筛选出观点摘要。朴素改进方法在COAE2016评测任务一测试数据集上,10个话题的平均ROUGE-1值达到26.39%,平均ROUGE-2值达到0.68%,平均ROUGE-SU4值达到5.69%,且评测官方公布结果显示,该方法在9项评价指标中获得6项最佳性能。基于语义图优化算法的方法在评测样例数据集上进行了实验,结果显示,该方法比朴素改进方法在ROUGE-1,ROUGE-2,ROUGE-SU4值上分别提升了0.63%, 1.51%, 2.69%。
中图分类号:
[1] 刘德喜, 万常选. 社会化短文本自动摘要研究综述[J]. 小型微型计算机系统, 2013, 34(12): 2764-2771. LIU Dexi, WAN Changxuan. Survey on automatic summarization of scocialized short text[J]. Journal of Chinese Computer Systems, 2013, 34(12): 2764-2771. [2] VANDERWENDE L, SUZUKI H, BROCKETTC, et al. Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion[J]. Information Processing & Management, 2007, 43(6): 1606-1618. [3] SINGH M, KHAN FU. Effect of incremental EM on document summarization using probabilistic latent semantic analysis[J]. Lecture Notes in Engineering and Computer Science, 2012, 2198(1):860-863. [4] GAO D, LI W, OUYANG Y, et al. LDA-based topic formation and topic-sentence reinforcement for graph-based multi-document summarization[C] //Asia Information Retrieval Symposium. Berlin Heidelberg:Springer, 2012: 376-385. [5] ERKAN G, RADEV D R. LexPageRank: prestige in multi-document text summarization[C].Boston: EMNLP,2004: 365-371. [6] ZHAO L, WU L, HUANG X. Using query expansion in graph-based approach for query-focused multi-document summarization[J]. Information Processing & Management, 2009, 45(1): 35-41. [7] WAN X, YANG J, XIAO J. Manifold-ranking based topic-focused multi-document summarization[C]. Hyderabad: IJCAI, 2007: 2903-2908. [8] OUYANG Y, LI W, LI S, et al. Applying regression models to query-focused multi-document summarization[J]. Information Processing & Management, 2011, 47(2): 227-237. [9] 刘晓娟, 尤斌, 张爱芸. 基于微博数据的应用研究综述[J]. 情报杂志, 2013, 32(9): 39-45. LIU Xiaojuan, YOU bin, ZHANG Aiyun, Reviewon the data used in researches of microblogs[J]. Journal of Intelligence, 2013, 32(9): 39-45. [10] INOUYE D, KALITA J K. Comparing twitter summarization algorithms for multiple post summaries[C] // Privacy, Security, Risk and Trust(PASSAT)and 2011 IEEE Third Inernational Conference on Social Computing(SocialCom), 2011 IEEE Third International Conference on. New York:IEEE, 2011: 298-306. [11] SHARIFI B, HUTTON M A, KALITA J. Summarizing microblogs automatically[C] // Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2010: 685-688. [12] MENG X, WEI F, LIU X, et al. Entity-centric topic-oriented opinion summarization in twitter[C] //Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2012: 379-387. [13] 林萌, 罗森林, 贾丛飞, 等. 融合句义结构模型的微博话题摘要算法[J]. 浙江大学学报(工学版), 2015(12): 2316-2325. LIN Meng, LUO Senlin, JIA Congfei, et al. MicroBlog Topics Summarization algorithm merging sentential structure model[J]. Journal of Zhejiang University(Engineering Science), 2015(12): 2316-2325. [14] LIN C Y. Rouge: a package for automatic evaluation of summaries[C] // Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. Stroudsburg: ACL, 2004, 8: 74-82. |
[1] | 张新猛, 蒋盛益, 张倩生, 谢柏林, 李霞. 基于用户偏好加权的混合网络推荐算法[J]. 山东大学学报(理学版), 2015, 50(09): 29-35. |
[2] | 王少鹏, 彭岩, 王洁. 基于LDA的文本聚类在网络舆情分析中的应用研究[J]. 山东大学学报(理学版), 2014, 49(09): 129-134. |
|