您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2021, Vol. 56 ›› Issue (5): 66-75.doi: 10.6040/j.issn.1671-9352.1.2020.029

• • 上一篇    下一篇

一种事件粒度的抽取式话题简短表示生成方法

王伟玉1,2,史存会1,2*,俞晓明1,刘悦1,程学旗1   

  1. 1.中国科学院计算技术研究所 中国科学院网络数据科学与技术重点实验室, 北京 100190;2.中国科学院大学, 北京 100190
  • 出版日期:2021-05-20 发布日期:2021-05-13
  • 作者简介:王伟玉(1995— ),女,硕士研究生,研究方向为自然语言处理. E-mail:gogoy@qq.com*通信作者简介:史存会(1987— ),男,博士研究生,工程师,研究方向为网络科学、信息推荐、事件抽取. E-mail:shicunhui@ict.ac.cn
  • 基金资助:
    国家自然科学基金青年科学基金资助项目(61802370);中国科学院战略先导科技专项(A类)(XDA19020400)

An extractive topic brief representation generation method to event

WANG Wei-yu1,2, SHI Cun-hui1,2*, YU Xiao-ming1, LIU Yue1, CHENG Xue-qi1   

  1. 1. CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
    2.University of Chinese Academy of Sciences, Beijing 100190, China
  • Online:2021-05-20 Published:2021-05-13

摘要: 利用事件报道描述内容高度相似的特点,提出了一种抽取式话题简短表示生成方法。把事件文档标题集中的标题作为处理对象,从不同的标题中抽取出保留原有语序的共性信息,并进一步融合这些共性信息,生成事件粒度的话题简短表示。在来自搜索引擎中的事件数据上,实验结果表明该方法能生成精练、准确、语义明确完整且可读性好的话题简短表示。

关键词: 话题简短表示生成, 抽取式, 事件

Abstract: This paper takes advantage of the fact that event description contents are highly similar, and proposes an extractive topic brief representation generation method, which takes the titles in the event document set as the processing object, and extracts the common information retaining the original word order from different titles, further integrates these common information to generate a topic brief representation of the event. The experimental results on the event data from search engines show that this method can well generate the topic brief representation with concise and accurate form, clear and complete semantics and good readability.

Key words: topic brief representation generation, extractive, event

中图分类号: 

  • TP391
[1] 洪宇,张宇,刘挺,等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报,2007,21(6):71-87. HONG Yu, ZHANG Yu, LIU Ting, et al. Topic detection and tracking review[J]. Journal of Chinese Information Processing, 2007, 21(6):71-87.
[2] 鲁琳. 面向中文微博的舆情分析技术研究[D]. 株洲:湖南工业大学,2014. LU Lin. Research on Chinese microblog public opinion analysis[D]. Zhuzhou: Hunan University of Technology, 2014.
[3] YOU Y, HUANG G, CAO J, et al. GEAM:a general and event-related aspects model for twitter event detection[C] //International Conference on Web Information Systems Engineering. Berlin: Springer, 2013: 319-332.
[4] ZHENG L, JIN P, ZHAO J, et al. A fine-grained approach for extracting events on microblogs[M] //Database and Expert Systems Applications. Munich:Springer International Publishing, 2014: 275-283.
[5] 徐雷,潘珺. 事件表示方式及其语义表示模型研究[J].情报杂志,2019,38(6):159-167. XU Lei, PAN Jun. Research on the way of event representation and its semantic representation model[J]. Journal of Intelligence, 2019, 38(6):159-167.
[6] 仲兆满,李存华,刘宗田,等. 面向Web新闻的事件多要素检索方法[J]. 软件学报,2013,24(10):2366-2378. ZHONG Zhaoman, LI Cunhua, LIU Zongtian, et al. Web news oriented event multi-elements retrieval[J]. Journal of Software, 2013, 24(10):2366-2378.
[7] 张瑾,杨森,王孝宗,等. 话题检测与跟踪研究进展综述[J]. 信息技术快报,2010,8(4):52-60. ZHANG Jin, YANG Sen, WANG Xiaozong, et al. Review of research progress on topic detection and tracking[J]. Information Technology Letter, 2010, 8(4):52-60.
[8] 张仰森,段宇翔,黄改娟,等. 社交媒体话题检测与追踪技术研究综述[J]. 中文信息学报,2019,33(7):1-10. ZHANG Yangsen, DUAN Yuxiang, HUANG Gaijuan, et al. A survey on topic detection and tracking methods in social media[J]. Journal of Chinese Information Processing, 2019, 33(7):1-10.
[9] SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5):513-523.
[10] 赵京胜,朱巧明,周国栋,等. 自动关键词抽取研究综述[J]. 软件学报,2017,28(9):2431-2449. ZHAO Jingsheng, ZHU Qiaoming, ZHOU Guodong, et al. Review of research in automatic keyword extraction[J]. Journal of Software, 2017, 28(9):2431-2449.
[11] PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking:bringing order to the Web[J]. Stanford Digital Libraries Working Paper, 1998, 9(1):1-14.
[12] MIHALCEA R, TARAU P. TextRank: bringing order into text[C] //Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona: Association for Computational Linguistics, 2004: 404-411.
[13] 刘栋,张彩环. 基于短语的中文标签自动生成混合算法[J]. 计算机科学,2014,41(S1):87-90. LIU Dong, ZHANG Caihuan. Keyphrase-based Chinese tags generation hybrid algorithm[J]. Computer Science, 2014, 41(S1):87-90.
[14] 刘兴林,郑启伦,马千里. 一种基于主题词集的自动文摘方法[J]. 计算机应用研究,2011,28(4):1322-1324. LIU Xinglin, ZHENG Qilun, MA Qianli. Automatic summarization method based on thematic term set[J]. Application Research of Computers, 2011, 28(4):1322-1324.
[15] 李娜娜,刘培玉,刘文锋,等. 基于TextRank的自动摘要优化算法[J]. 计算机应用研究,2019,36(4):1045-1050. LI Nana, LIU Peiyu, LIU Wenfeng, et al. Automatic digest optimization algorithm based on TextRank[J]. Application Research of Computers, 2019, 36(4):1045-1050.
[16] 韩永峰,许旭阳,李弼程,等. 基于事件抽取的网络新闻多文档自动摘要[J]. 中文信息学报,2012,26(1):58-67. HAN Yongfeng, XU Xuyang, LI Bicheng, et al. Web news multi-document summarization based on event extraction[J]. Journal of Chinese Information Processing, 2012, 26(1):58-67.
[17] 王晓东. 计算机算法设计与分析[M]. 北京:电子工业出版社,2012:44-54. WANG Xiaodong. Computer algorithm design and analysis[M]. Beijing: Publishing House of Electronics Industry, 2012:44-54.
[18] Daniel S Hirschberg. Algorithms for the longest common subsequence problem[J]. Journal of the ACM(JACM), 1977, 24(4):664-675.
[19] BERGROTH L, HAKONEN H, RAITA T. A survey of longest common subsequence algorithms[C] //Proceedings Seventh International Symposium on String Processing and Information Retrieval. Curuna: IEEE, 2000: 39-48.
[20] BOUDIN F. A comparison of centrality measures for graph-based keyphrase extraction[C] //Proceedings of the Sixth International Joint Conference on Natural Language Processing. Nagoya: Asian Federation of Natural Language Processing, 2013: 834-838.
[21] 章志华,陆海良,郁钢. 基于TFIDF算法的关键词提取方法[J]. 信息技术与信息化,2015,188(8):164-166. ZHANG Zhihua, LU Hailiang, YU Gang. A keyword extracting technique based on TFIDF algorithm[J]. Information Technology and Informatization, 2015, 188(8):164-166.
[22] LIN C Y. ROUGE:a package for automatic evaluation of summaries[C] //Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona: ACL, 2004: 74-81.
[1] 杨洋,吴保卫,王月娥. 基于事件触发异步切换系统的输入输出有限时间稳定[J]. 《山东大学学报(理学版)》, 2020, 55(2): 118-126.
[2] 冯娜娜,吴保卫. 切换奇异系统事件触发控制的输入输出有限时间稳定[J]. 《山东大学学报(理学版)》, 2019, 54(3): 75-84.
[3] 叶晓鸣,陈兴蜀,杨力,王文贤,朱毅,邵国林,梁刚. 基于图演化事件的主机群异常检测模型[J]. 山东大学学报(理学版), 2018, 53(9): 1-11.
[4] 唐明伟,苏新宁,蒋勋. RESTful Web服务和知识库协同驱动的突发事件网络舆情实时追踪[J]. 山东大学学报(理学版), 2017, 52(6): 49-55.
[5] 林丽. 基于核心依存图的新闻事件抽取[J]. 山东大学学报(理学版), 2016, 51(9): 121-126.
[6] 李希鹏,郭岩,赵岭,张儒清,刘悦,俞晓明,程学旗. 基于事件的新闻客户端热门评论预测框架[J]. 山东大学学报(理学版), 2016, 51(3): 91-97.
[7] 何新华, 胡文发, 肖敏. 突发事件下应急服务供应链的期权协同决策[J]. 山东大学学报(理学版), 2015, 50(11): 81-90.
[8] 李风环, 郑德权, 赵铁军. 基于浅层语义分析的主题事件的时间识别[J]. 山东大学学报(理学版), 2015, 50(11): 74-80.
[9] 徐霞, 李培峰, 郑新, 朱巧明. 面向半监督中文事件抽取的事件推理方法[J]. 山东大学学报(理学版), 2014, 49(12): 12-17.
[10] 丁然 李歧强 梁涛. 具有分解结构的多目的批处理过程短期调度模型[J]. J4, 2010, 45(1): 73-79.
[11] 罗邦莹,,王钦敏,邱锦明, . F-风险入侵生成与承灾类特征[J]. J4, 2007, 42(11): 101-106 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!