您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2025, Vol. 60 ›› Issue (9): 71-86.doi: 10.6040/j.issn.1671-9352.0.2024.039

• • 上一篇    

基于情节描述的中文长篇小说高潮章节识别方法

王文晶1, 刘忠宝2*,万广文2,胡迦南3   

  1. 1.山西工程科技职业大学信息工程学院, 山西 太原 030619;2.北京语言大学信息科学学院, 北京 100083;3.中北大学软件学院, 山西 太原 030051
  • 发布日期:2025-09-10
  • 通讯作者: 刘忠宝(1981— ),男,教授,博士,研究方向为数字人文、文化数字化. E-mail:liuzb@nuc.edu.cn
  • 作者简介:王文晶(1981— ),女,副教授,硕士,研究方向为智能计算. E-mail:806214106@qq.com*通信作者:刘忠宝(1981— ),男,教授,博士,研究方向为数字人文、文化数字化. E-mail:liuzb@nuc.edu.cn
  • 基金资助:
    国家社科基金重点项目“大数据时代古籍活化赋能文化自信自强的理论、方法与路径研究”(23AZD047)

Climaxchapter recognition method of chinese long novel based on plot description

WANG Wenjing1, LIU Zhongbao2*, WAN Guangwen2, HU Jianan3   

  1. 1. College of Information Engineering, Shanxi Vocational University of Engineering Science and Technology, Taiyuan 030619, Shanxi, China;
    2. School of Information Science, Beijing Language and Culture University, Beijing 100083, China;
    3. School of Software, North University of China, Taiyuan 030051, Shanxi, China
  • Published:2025-09-10

摘要: 在精准刻画中文长篇小说情节的基础上,探讨中文长篇小说高潮章节识别方法。该方法由关键要素抽取和高潮章节识别2部分组成,其中前者包括观点段落、非观点段落、章节关键词、主要角色等关键要素抽取,后者在建立章节情节描述矩阵的基础上,引入BiGRU模型与多头注意力机制,实现中文长篇小说高潮章节识别。金庸小说语料集上的比较实验表明,与朴素贝叶斯(naive Bayesian, NB)、支持向量机(support vector machine, SVM)、预训练模型Roberta-large、双向长短时记忆网络(bi-directional long short-term memory, BiLSTM)等模型相比,本文所提方法具有更优的识别性能。消融实验验证所提方法主要组成部分的有效性。

关键词: 中文长篇小说, 关键要素抽取, 章节情节描述矩阵, 高潮章节识别

Abstract: How to quickly and accurately identify the climax chapter has become a common problem faced by the majority of readers in their reading choices. In view of this, the method of identifying the climax chapters of Chinese long novel on the basis of accurately portraying the plot of Chinese long novel is explored, which consists of two parts, namely, key element extraction and climax chapter recognition, where the former includes the extraction of key elements such as viewpoint and non-viewpoint passages, keywords of the chapter, main characters, etc., and the latter, based on the establishment of the chapter plot description matrix, introduces the BiGRU model and the multi-head attention mechanism to realize the climax chapter recognition of Chinese long novel. Comparative experiments on Jin Yongs novel corpus show that the proposed method in this paper has better recognition performance compared with models such as Naive Bayesian(NB), Support Vector Machine(SVM), pre-trained model named Roberta-large, and Bi-directional Long Short-Term Memory(BiLSTM). Ablation experiments validate the effectiveness of the main components of the proposed method.

Key words: Chinese novel, main component extraction, chapter plot description matrix, climax chapter recognition

中图分类号: 

  • TP391
[1] 肖天久,刘颖. 基于聚类和分类的金庸与古龙小说风格分析[J]. 中文信息学报,2015,29(5):167-177. XIAO Tianjiu, LIU Ying. A styistic analysis of Jin Yongs and Gu Longs fictions based on text clustering and classification[J]. Journal of Chinese Information Processing, 2015, 29(5):167-177.
[2] 姚睿琦,张辉,姚云洪. 社会网络分析方法在金庸小说人物关系中的应用研究[J]. 文献与数据学报,2021,3(3):68-80. YAO Ruiqi, ZHANG Hui, YAO Yunhong. Research on application of social network analysis on character relationships in Jin Yongs novels[J]. Journal of Library and Data, 2021, 3(3):68-80.
[3] 张旋,梁循,李志宇,等. 金庸小说中主角复杂爱情模式的识别与分析[J]. 中文信息学报,2019,33(4):109-119. ZHANG Xuan, LIANG Xun, LI Zhiyu,et al. Identification and analysis of love relationships of protagonists in Jin Yongs fictions[J]. Journal of Chinese Information Processing, 2019, 33(4):109-119.
[4] 邰沁清,夏恩赏,饶高琦,等. 数字人文视角下的金庸文本挖掘研究[J]. 数字人文,2020,4:115-136. TAI Qinqing, XIA Enshang, RAO Gaoqi, et al. Research on Jin Yong with text mining from the perspective of digital humanities[J]. Digital Humanities, 2020, 4:115-136.
[5] LIU Ying, XIAO Tianjin. A stylistic analysis for Gu Longs Kung Fu novels[J]. Journal of Quantitative Linguistics, 2020, 27(1):32-61.
[6] XIA Enshan, TAI Qingqing, LI Qi, et al. Digital humanities research of Jin Yongs works based on quantitative linguistics[J]. International Journal of Knowledge and Language, 2021, 12(1):1-10.
[7] ZHANG Le, WANG Shuai, LIU Bing. Deep learning for sentiment analysis: a survey[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2018, 8(4):e1253.
[8] KIM E, KLINGER R. An analysis of emotion communication channels in fan-fiction: towards emotional storytelling[C] // Proceedings of the Second Workshop on Storytelling. Florence:ACL, 2019:56-64.
[9] ZEHE A, BECKER M, HETTINGER L, et al. Prediction of happy endings in German novels based on sentiment information[C] //Proceedings of the 3rd Workshop on Interactions between Data Mining and Natural Language. Riva del Garda:[s.n.] , 2016:9-16.
[10] MOHAMMAD S M, TURNEY P. NRC emotion lexicon[EB/OL].(2011-07-10)[2024-01-30]. http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm.
[11] HORTON T, TAYLOR K, YU B, et al. “Quite right, dear and interesting”: seeking the sentimental in nineteenth century American fiction[C] // Digital Humanities Conferences. Paris:[s.n.] , 2006:81-82.
[12] YU Bei. An evaluation of text classification methods for literary study[J]. Literary and Linguistic Computing, 2008, 23(3): 327-343.
[13] 梁循. 基于深度学习的社会信息挖掘应用实例分析[M]. 北京:科学出版社,2020. LIANG Xun. Application instance analysis of social information mining based on deep learning[M]. Beijing: Science Press, 2020.
[14] 宋琦. 武侠小说从“民国旧派”到“港台新派”叙事模式的变迁[D]. 济南:山东大学,2010. SONG Qi. The narrative model changes of martial arts novels from “old school during the republican period” to “new breed of Hong Kong and Taiwan”[D]. Jinan:Shandong University, 2010.
[15] 曹正文.中国侠文化史[M].上海:上海书店出版社, 2014. CAO Zhengwen. History of Chinese chivalrous culture[M]. Shanghai: Shanghai Bookstore Publishing House, 2014.
[16] HAN H, CHOI J D. The stem cell hypothesis: dilemma behind multi-task learning with transformer encoders[C] // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: ACL, 2021:5555-5577.
[17] KUMAR A, VEPA J. Gated mechanism for attention based multi modal sentiment analysis[C] // Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Washington, D.C.: IEEE, 2020:4477-448.
[18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //31st Conference on Neural Information Processing Systems(NIPS 2017). Long Beach: ACM, 2017:5998-6008.
[19] 过临朋. 基于NLP的小说人物属性抽取系统[D]. 北京:北京邮电大学,2021. GUO Linpeng. A NLP-based novel character attribute extraction system[D]. Beijing: Beijing University of Posts and Telecommunications, 2021.
[20] XU Liang, HU Hai, ZHANG Xuanwei, et al. CLUE: a Chinese language understanding evaluation benchmark[C] //Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: ACL, 2020:4762-4772.
[21] BAL M. Narratology: introduction to the theory of narrative[M]. Toronto: University of Toronto Press, 2009.
[1] 梁霞,郭洁. 基于在线评论的线上教学平台选择方法[J]. 《山东大学学报(理学版)》, 2024, 59(9): 108-118.
[2] 黎超,廖薇. 基于医疗知识驱动的中文疾病文本分类模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 122-130.
[3] 纪杰,孙承杰,单丽莉,尚伯乐,林磊. 基于提示学习的电信网络诈骗案件分类方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 113-121.
[4] 罗奇,苟刚. 基于聚类和群组归一化的多模态对话情绪识别[J]. 《山东大学学报(理学版)》, 2024, 59(7): 105-112.
[5] 赵峰叙,王健,林原,林鸿飞. 面向排序学习的概率分布优化模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 95-104.
[6] 黄兴宇,赵明宇,吕子钰. 面向图神经网络表征学习的类别知识探针[J]. 《山东大学学报(理学版)》, 2024, 59(7): 85-94.
[7] 桂梁,徐遥,何世柱,张元哲,刘康,赵军. 基于动态邻居选择的知识图谱事实错误检测方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 76-84.
[8] 咸宁,范意兴,廉涛,郭嘉丰. 融合多重特征的噪声网络对齐方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 64-75.
[9] 孙承杰,李宗蔚,单丽莉,林磊. 一种基于核心论元的篇章级事件抽取方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 53-63.
[10] 刘沛羽,姚博文,高泽峰,赵鑫. 基于矩阵乘积算符表示的序列化推荐模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 44-52, 104.
[11] 邵伟,朱高宇,于雷,郭嘉丰. 高维数据的降维与检索算法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 27-43.
[12] 杨纪元,马沐阳,任鹏杰,陈竹敏,任昭春,辛鑫,蔡飞,马军. 基于自监督的预训练在推荐系统中的研究[J]. 《山东大学学报(理学版)》, 2024, 59(7): 1-26.
[13] 陈海粟,廖佳纯,姚思诚. 政府开放数据中个人信息披露识别与统计方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 95-106.
[14] 温欣,李德玉. 基于属性加权的ML-KNN方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 107-117.
[15] 曾雪强,孙雨,刘烨,万中英,左家莉,王明文. 基于情感分布的emoji嵌入式表示[J]. 《山东大学学报(理学版)》, 2024, 59(3): 81-94.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!