您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (7): 53-63.doi: 10.6040/j.issn.1671-9352.1.2023.080

• 综述 • 上一篇    下一篇

一种基于核心论元的篇章级事件抽取方法

孙承杰(),李宗蔚,单丽莉,林磊   

  1. 哈尔滨工业大学计算学部,黑龙江 哈尔滨 150001
  • 收稿日期:2023-10-18 出版日期:2024-07-20 发布日期:2024-07-15
  • 作者简介:孙承杰(1980—),男,副教授,博士,研究方向为自然语言处理、信息抽取、对话系统. E-mail: sunchengjie@hit.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2021YFF0901600);国家自然科学基金资助项目(62176074);哈尔滨工业大学新兴交叉“融拓计划”资助项目(SYL-JC-202203)

A document-level event extraction method based on core arguments

Chengjie SUN(),Zongwei LI,Lili SHAN,Lei LIN   

  1. Faculty of Computing, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
  • Received:2023-10-18 Online:2024-07-20 Published:2024-07-15

摘要:

提出一种基于核心论元的篇章级事件抽取选取方法(core arguments-based document level event extraction, CA-DocEE),该方法根据论元在篇章级事件中的分布特点定义核心论元的选取标准,采用异质图卷积神经网络将篇章上下文信息用于增强论元实体编码,基于机器阅读理解方法捕捉句子中的深层次语义信息来进行论元角色分类。在篇章级事件抽取公开数据集上,本文提出的方法的微平均F1值达到了80.1%,取得了与目前已知最好方法相当的效果。

关键词: 事件抽取, 篇章级事件抽取, 机器阅读理解, 图卷积神经网络

Abstract:

A document-level event extraction method based on core arguments(CA-DocEE) is proposed, which defines criteria for selecting core arguments based on their distributions in document-level events, uses heterogeneous graph convolutional neural networks to augment document contextual information for encoding argument entities, and captures deep semantic information in sentences based on machine reading comprehension methods for classifying the role of arguments. On the document-level event extraction dataset, the method proposed in this paper achieves a micro-average F1 value of 80.1%, which is comparable with the state-of-the-art methods.

Key words: event extraction, document-level event extraction, machine reading comprehension, graph convolutional neural network

中图分类号: 

  • TP391.1

图1

基于论元关系图的篇章级事件抽取模型架构"

图2

论元关系图示例"

图3

论元角色分类模型图"

表1

论元实体提及结果对比"

论元提及识别模型 micro-P micro-R micro-F1
BiLSTM+CRF 88.0 82.9 85.4
BERT+MCRF 91.0 91.2 91.1

表2

实验总体结果对比"

模型 S M All
DCFEE-O 72.4 52.4 63.2
GIT 86.8 72.3 79.9
PTPCG 88.2 69.1 79.4
PTPCG复现结果 86.2 68.6 78.5
CA-DocEE 88.0 70.3 80.1

表3

消融实验结果"

模型 micro-P micro-R micro-F1
使用Zhu[1]的伪触发词作为核心论元 82.7 76.8 79.6
-MRC 83.2 76.9 79.9
CA-DocEE 83.7 77.4 80.1
1 ZHU Tong, QU Xiaoye, CHEN Wenliang, et al. Efficient document-level event extraction via pseudo-trigger-aware pruned complete graph[EB/OL]. (2021-12-11)[2023-10-18]. http://arxiv.org/abs/2112.06013.
2 RILOFF E. Automatically constructing a dictionary for information extraction tasks[C]//AAAI'93: Proceedings of the Eleventh National Conference on Artificial Intelligence. Washington: AAAI Press, 1993: 811-816.
3 AHN D. The stages of event extraction[C]// Proceedings of the Workshop on Annotating and Reasoning about Time and Events. Stroudsburg: Association for Computational Linguistics, 2006: 1-8.
4 CHEN Zheng, JI Heng. Language specific issue and feature exploration in Chinese event extraction[C]//Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers on-NAACL'09. Morristown: Association for Computational Linguistics, 2009: 209-212.
5 LIAO S S, GRISHMAN R. Using document level cross-event inference to improve event extraction[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala: ACM, 2010: 789-797.
6 JI H, GRISHMAN R. Refining event extraction through cross-document inference[C]//Proceedings of ACL-08: Hlt. Columbus: Association for Computational Linguistics, 2008: 254-262.
7 CHEN Yubo, XU Liheng, LIU Kang, et al. Event extraction via dynamic multi-pooling convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing: Association for Computational Linguistics, 2015: 167-176.
8 YANG Sen, FENG Dawei, QIAO Linbo, et al. Exploring pre-trained language models for event extraction and generation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019: 5284-5294.
9 SHA L, LIU J, LIN C Y, et al. RBPB: regularization-based pattern balancing method for event extraction[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2016: 1224-1234.
10 DU X Y, CLAIRE C. Event extraction by answering (almost) natural questions[EB/OL]. (2020-04-28)[2023-10-18]. http://arxiv.org/abs/2004.13625.
11 CHEN C, NG V. Joint modeling for Chinese event extraction with rich linguistic features[C]// Proceedings of COLING. Mumbai: The COLING 2012 Organizing Committee, 2012: 529-544.
12 LI Qi, JI Heng, HUANG Liang, et al. Joint event extraction via structured prediction with global features[C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Bulgaria: Association for Computational Linguistics, 2013: 73-82.
13 NGUYEN T H, CHO K, GRISHMAN R. Joint event extraction via recurrent neural networks[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: Association for Computational Linguistics, 2016: 300-309.
14 CHEN Yubo, LIU Shulin, ZHANG Xiang, et al. Automatically labeled data generation for large scale event extraction[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics, 2017: 409-419.
15 LI Qian , PENG Hao , LI Jianxin , et al. Reinforcement learning-based dialogue guided event extraction to exploit argument relations[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30, 520- 533.
doi: 10.1109/TASLP.2021.3138670
16 YANG Hang, CHEN Yubo, LIU Kang, et al. DCFEE: a document-level Chinese financial event extraction system based on automatically labeled training data[C]// Proceedings of ACL 2018, System Demonstrations. Melbourne: Association for Computational Linguistics, 2018: 50-55.
17 WANG Haitao , ZHU Tong , WANG Mingtao , et al. A prior information enhanced extraction framework for document-level financial event extraction[J]. Data Intelligence, 2021, 3 (3): 460- 476.
doi: 10.1162/dint_a_00103
18 ZHANG Hongkuan, SONG Hui, WANG Shuyi, et al. A bert-based end-to-end model for Chinese document-level event extraction[C]// Proceedings of the 19th Chinese National Conference on Computational Linguistics. Haikou: Chinese Information Processing Society of China, 2020: 390-401.
19 YANG Hang, SUI Dianbo, CHEN Yubo, et al. Document-level event extraction via parallel prediction networks[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 6298-6308.
20 WANG Peng , DENG Zhenkai , CUI Ruilong . TDJEE: a document-level joint model for financial event extraction[J]. Electronics, 2021, 10 (7): 824.
doi: 10.3390/electronics10070824
21 LIU Jian , LIANG Chen , XU Jinan . Document-level event argument extraction with self-augmentation and a cross-domain joint training mechanism[J]. Knowledge-based Systems, 2022, 257, 109904.
doi: 10.1016/j.knosys.2022.109904
22 ZHENG Shun, CAO Wei, XU Wei, et al. Doc2EDAG: an end-to-end document-level framework for Chinese financial event extraction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: Association for Computational Linguistics, 2019: 337-346.
23 XU Runxin, LIU Tianyu, LI Lei, et al. Document-level event extraction via heterogeneous graph-based interaction model with a tracker[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 3533-3546.
24 WEI Tianwen, QI Jianwei, HE Shenghuan, et al. Masked conditional random fields for sequence labeling[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2021: 2024-2035.
25 ZENG Shuang, XU Runxin, CHANG Baobao, et al. Double graph based reasoning for document-level relation extraction[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: Association for Computational Linguistics, 2020: 1630-1640.
26 WANG Difeng, HU Wei, CAO Ermei, et al. Global-to-local neural networks for document-level relation extraction[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: Association for Computational Linguistics, 2020: 3711-3721.
[1] 鲍亮,陈志豪,陈文章,叶锴,廖祥文. 基于双重多路注意力匹配的观点型阅读理解[J]. 《山东大学学报(理学版)》, 2021, 56(3): 44-53.
[2] 阴爱英,林建洲,吴运兵,廖祥文. 融合图卷积神经网络的文本情感分类[J]. 《山东大学学报(理学版)》, 2021, 56(11): 15-23.
[3] 林丽. 基于核心依存图的新闻事件抽取[J]. 山东大学学报(理学版), 2016, 51(9): 121-126.
[4] 李风环, 郑德权, 赵铁军. 基于浅层语义分析的主题事件的时间识别[J]. 山东大学学报(理学版), 2015, 50(11): 74-80.
[5] 徐霞, 李培峰, 郑新, 朱巧明. 面向半监督中文事件抽取的事件推理方法[J]. 山东大学学报(理学版), 2014, 49(12): 12-17.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李慧娟,尹海英,张学成,杨爱芳* . 转蔗糖: 蔗糖-1-果糖基转移酶基因提高烟草的耐旱性[J]. J4, 2007, 42(1): 89 -94 .
[2] 许传轲 陈月辉 赵亚欧. 基于改进伪氨基酸组成的蛋白质相互作用预测[J]. J4, 2009, 44(9): 17 -21 .
[3] 王红妹,肖敏*,李正义,李玉梅,钱新民 . 转糖基β-半乳糖苷酶产生菌筛选和鉴定及酶催化生成低聚半乳糖[J]. J4, 2006, 41(1): 133 -139 .
[4] 张佳丽,苗连英,宋文耀. 最大度为8不含相邻4-圈的1-平面图边色数[J]. 山东大学学报(理学版), 2014, 49(04): 18 -23 .
[5] 孙小婷1,靳岚2*. DOSY在寡糖混合物分析中的应用[J]. J4, 2013, 48(1): 43 -45 .
[6] 任敏1,2,张光辉1. 右半直线上依分布收敛独立随机环境中随机游动的吸收概率[J]. J4, 2013, 48(1): 93 -99 .
[7] 徐俊峰. 关于复代数微分方程亚纯解的增长级[J]. J4, 2010, 45(6): 91 -93 .
[8] 张丽,许玉铭 . σ1-空间及其性质[J]. J4, 2006, 41(5): 30 -32 .
[9] 王刚,许信顺*. 一种新的基于多示例学习的场景分类方法[J]. J4, 2010, 45(7): 108 -113 .
[10] 陆玮洁,主沉浮,宋 翠,杨艳丽 . 中药郁金中无机离子的毛细管电泳法测定[J]. J4, 2007, 42(7): 13 -18 .