《山东大学学报(理学版)》 ›› 2021, Vol. 56 ›› Issue (5): 66-75.doi: 10.6040/j.issn.1671-9352.1.2020.029
王伟玉1,2,史存会1,2*,俞晓明1,刘悦1,程学旗1
WANG Wei-yu1,2, SHI Cun-hui1,2*, YU Xiao-ming1, LIU Yue1, CHENG Xue-qi1
摘要: 利用事件报道描述内容高度相似的特点,提出了一种抽取式话题简短表示生成方法。把事件文档标题集中的标题作为处理对象,从不同的标题中抽取出保留原有语序的共性信息,并进一步融合这些共性信息,生成事件粒度的话题简短表示。在来自搜索引擎中的事件数据上,实验结果表明该方法能生成精练、准确、语义明确完整且可读性好的话题简短表示。
中图分类号:
[1] 洪宇,张宇,刘挺,等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报,2007,21(6):71-87. HONG Yu, ZHANG Yu, LIU Ting, et al. Topic detection and tracking review[J]. Journal of Chinese Information Processing, 2007, 21(6):71-87. [2] 鲁琳. 面向中文微博的舆情分析技术研究[D]. 株洲:湖南工业大学,2014. LU Lin. Research on Chinese microblog public opinion analysis[D]. Zhuzhou: Hunan University of Technology, 2014. [3] YOU Y, HUANG G, CAO J, et al. GEAM:a general and event-related aspects model for twitter event detection[C] //International Conference on Web Information Systems Engineering. Berlin: Springer, 2013: 319-332. [4] ZHENG L, JIN P, ZHAO J, et al. A fine-grained approach for extracting events on microblogs[M] //Database and Expert Systems Applications. Munich:Springer International Publishing, 2014: 275-283. [5] 徐雷,潘珺. 事件表示方式及其语义表示模型研究[J].情报杂志,2019,38(6):159-167. XU Lei, PAN Jun. Research on the way of event representation and its semantic representation model[J]. Journal of Intelligence, 2019, 38(6):159-167. [6] 仲兆满,李存华,刘宗田,等. 面向Web新闻的事件多要素检索方法[J]. 软件学报,2013,24(10):2366-2378. ZHONG Zhaoman, LI Cunhua, LIU Zongtian, et al. Web news oriented event multi-elements retrieval[J]. Journal of Software, 2013, 24(10):2366-2378. [7] 张瑾,杨森,王孝宗,等. 话题检测与跟踪研究进展综述[J]. 信息技术快报,2010,8(4):52-60. ZHANG Jin, YANG Sen, WANG Xiaozong, et al. Review of research progress on topic detection and tracking[J]. Information Technology Letter, 2010, 8(4):52-60. [8] 张仰森,段宇翔,黄改娟,等. 社交媒体话题检测与追踪技术研究综述[J]. 中文信息学报,2019,33(7):1-10. ZHANG Yangsen, DUAN Yuxiang, HUANG Gaijuan, et al. A survey on topic detection and tracking methods in social media[J]. Journal of Chinese Information Processing, 2019, 33(7):1-10. [9] SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5):513-523. [10] 赵京胜,朱巧明,周国栋,等. 自动关键词抽取研究综述[J]. 软件学报,2017,28(9):2431-2449. ZHAO Jingsheng, ZHU Qiaoming, ZHOU Guodong, et al. Review of research in automatic keyword extraction[J]. Journal of Software, 2017, 28(9):2431-2449. [11] PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking:bringing order to the Web[J]. Stanford Digital Libraries Working Paper, 1998, 9(1):1-14. [12] MIHALCEA R, TARAU P. TextRank: bringing order into text[C] //Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona: Association for Computational Linguistics, 2004: 404-411. [13] 刘栋,张彩环. 基于短语的中文标签自动生成混合算法[J]. 计算机科学,2014,41(S1):87-90. LIU Dong, ZHANG Caihuan. Keyphrase-based Chinese tags generation hybrid algorithm[J]. Computer Science, 2014, 41(S1):87-90. [14] 刘兴林,郑启伦,马千里. 一种基于主题词集的自动文摘方法[J]. 计算机应用研究,2011,28(4):1322-1324. LIU Xinglin, ZHENG Qilun, MA Qianli. Automatic summarization method based on thematic term set[J]. Application Research of Computers, 2011, 28(4):1322-1324. [15] 李娜娜,刘培玉,刘文锋,等. 基于TextRank的自动摘要优化算法[J]. 计算机应用研究,2019,36(4):1045-1050. LI Nana, LIU Peiyu, LIU Wenfeng, et al. Automatic digest optimization algorithm based on TextRank[J]. Application Research of Computers, 2019, 36(4):1045-1050. [16] 韩永峰,许旭阳,李弼程,等. 基于事件抽取的网络新闻多文档自动摘要[J]. 中文信息学报,2012,26(1):58-67. HAN Yongfeng, XU Xuyang, LI Bicheng, et al. Web news multi-document summarization based on event extraction[J]. Journal of Chinese Information Processing, 2012, 26(1):58-67. [17] 王晓东. 计算机算法设计与分析[M]. 北京:电子工业出版社,2012:44-54. WANG Xiaodong. Computer algorithm design and analysis[M]. Beijing: Publishing House of Electronics Industry, 2012:44-54. [18] Daniel S Hirschberg. Algorithms for the longest common subsequence problem[J]. Journal of the ACM(JACM), 1977, 24(4):664-675. [19] BERGROTH L, HAKONEN H, RAITA T. A survey of longest common subsequence algorithms[C] //Proceedings Seventh International Symposium on String Processing and Information Retrieval. Curuna: IEEE, 2000: 39-48. [20] BOUDIN F. A comparison of centrality measures for graph-based keyphrase extraction[C] //Proceedings of the Sixth International Joint Conference on Natural Language Processing. Nagoya: Asian Federation of Natural Language Processing, 2013: 834-838. [21] 章志华,陆海良,郁钢. 基于TFIDF算法的关键词提取方法[J]. 信息技术与信息化,2015,188(8):164-166. ZHANG Zhihua, LU Hailiang, YU Gang. A keyword extracting technique based on TFIDF algorithm[J]. Information Technology and Informatization, 2015, 188(8):164-166. [22] LIN C Y. ROUGE:a package for automatic evaluation of summaries[C] //Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona: ACL, 2004: 74-81. |
[1] | 杨洋,吴保卫,王月娥. 基于事件触发异步切换系统的输入输出有限时间稳定[J]. 《山东大学学报(理学版)》, 2020, 55(2): 118-126. |
[2] | 冯娜娜,吴保卫. 切换奇异系统事件触发控制的输入输出有限时间稳定[J]. 《山东大学学报(理学版)》, 2019, 54(3): 75-84. |
[3] | 叶晓鸣,陈兴蜀,杨力,王文贤,朱毅,邵国林,梁刚. 基于图演化事件的主机群异常检测模型[J]. 山东大学学报(理学版), 2018, 53(9): 1-11. |
[4] | 唐明伟,苏新宁,蒋勋. RESTful Web服务和知识库协同驱动的突发事件网络舆情实时追踪[J]. 山东大学学报(理学版), 2017, 52(6): 49-55. |
[5] | 林丽. 基于核心依存图的新闻事件抽取[J]. 山东大学学报(理学版), 2016, 51(9): 121-126. |
[6] | 李希鹏,郭岩,赵岭,张儒清,刘悦,俞晓明,程学旗. 基于事件的新闻客户端热门评论预测框架[J]. 山东大学学报(理学版), 2016, 51(3): 91-97. |
[7] | 何新华, 胡文发, 肖敏. 突发事件下应急服务供应链的期权协同决策[J]. 山东大学学报(理学版), 2015, 50(11): 81-90. |
[8] | 李风环, 郑德权, 赵铁军. 基于浅层语义分析的主题事件的时间识别[J]. 山东大学学报(理学版), 2015, 50(11): 74-80. |
[9] | 徐霞, 李培峰, 郑新, 朱巧明. 面向半监督中文事件抽取的事件推理方法[J]. 山东大学学报(理学版), 2014, 49(12): 12-17. |
[10] | 丁然 李歧强 梁涛. 具有分解结构的多目的批处理过程短期调度模型[J]. J4, 2010, 45(1): 73-79. |
[11] | 罗邦莹,,王钦敏,邱锦明, . F-风险入侵生成与承灾类特征[J]. J4, 2007, 42(11): 101-106 . |
|