您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (7): 122-130.doi: 10.6040/j.issn.1671-9352.0.2023.291

• 综述 • 上一篇    

基于医疗知识驱动的中文疾病文本分类模型

黎超(),廖薇*()   

  1. 上海工程技术大学电子电气工程学院,上海 201620
  • 收稿日期:2023-06-30 出版日期:2024-07-20 发布日期:2024-07-15
  • 通讯作者: 廖薇 E-mail:2057775195@qq.com;liaowei54@126.com
  • 作者简介:黎超(1999—),男,硕士研究生,研究方向为自然语言处理、文本分类. E-mail: 2057775195@qq.com
  • 基金资助:
    国家自然科学基金资助项目(62001282)

Chinese disease text classification model driven by medical knowledge

Chao LI(),Wei LIAO*()   

  1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
  • Received:2023-06-30 Online:2024-07-20 Published:2024-07-15
  • Contact: Wei LIAO E-mail:2057775195@qq.com;liaowei54@126.com

摘要:

本文提出一种基于医疗知识驱动的中文疾病文本分类模型。首先,通过引入外部医疗知识图谱中的结构化知识,得到知识增强的疾病文本向量表示;其次,使用双向长短期记忆网络和卷积神经网络分别提取疾病文本的全局语义特征和局部语义特征,同时,联合注意力机制提高模型对有效特征信息提取的效率;最后,将提取到的特征进行拼接融合,并利用分类器输出分类结果。在中文疾病文本数据集上的实验结果表明,所提模型分类的精确率、召回率和精确率和召回率的调和均值F1值分别可达95.21%、95.64%和95.42%,与其他模型相比具有更优的分类效果。

关键词: 疾病文本分类, 知识图谱, 卷积神经网络, 双向长短期记忆网络, 注意力机制

Abstract:

This study proposes a Chinese disease text classification model that integrates knowledge graph. Firstly, by introducing structured knowledge from external medical knowledge graph, a knowledge enhanced disease text vector representation is obtained; Secondly, the global semantic features and local semantic features of the disease text are extracted by using bidirectional long short-term memory network and convolutional neural network respectively. At the same time, the joint attention mechanism improves the efficiency of the model in extracting effective features information; Finally, the extracted features are concatenated and fused, and a classifier is used to output the classification result. The experimental results on the Chinese disease text dataset show that the proposed model has a classification accuracy, recall, and the harmonic mean value F1 of 95.21%, 95.64%, and 95.42%, respectively, which shows better classification performance compared to other models.

Key words: disease text classification, knowledge graph, CNN, BiLSTM, attention mechanism

中图分类号: 

  • TP391

图1

DKCDM模型"

表1

疾病文本及对应科室"

疾病文本 科室
医生您好,乙肝表面抗原阴性,谷丙转氨酶169谷草转氨酶87正常吗? 肝病科
  我周围有很多认识的人得这种病,有的人把甲状腺切除了,有的人症状比较轻,但是人也变得消瘦了。引起这种病的原因是什么,治疗方法是什么? 内分泌科
前几天运动场有人跌倒后突然癫痫,全身发颤,请问癫痫是怎么造成的? 神经科

图2

LSTM单元结构"

图3

BiLSTM模型结构"

表2

混淆矩阵"

预测标签 Positive Negative
Positive TP FP
Negative FN TN

图4

分类结果混淆矩阵"

图5

不同模型的F1值对比"

表3

模型测试实验结果"

模型 P R F1
SVM 90.32 90.64 90.48
TextCNN 92.17 92.43 92.30
TextRNN 92.25 92.68 92.46
FastText 93.72 93.54 93.47
TextRCNN 93.47 93.85 93.66
DKCDM 95.21 95.64 95.42

表4

消融实验"

模型 P R F1
Remove KG 93.62 93.96 93.79
Remove TransE 95.18 95.27 95.22
RemoveBiLSTM_Attention 94.76 94.12 94.44
Remove CNN_Attention 94.17 94.51 94.34
DKCDM 95.21 95.64 95.42
1 MA Y W, CHEN J L, SHIH W K. The survey for next generation mobile networks framework applied to intelligent Internet of medical[C]//2021 IEEE International Conference on Smart Internet of Things. Jeju: IEEE, 2021: 267-270.
2 LIYufei,SONGYuanyuan,ZHAOWei,et al.Exploring the role of online health community information in patients' decisions to switch from online to offline medical services[J].International Journal of Medical Informatics,2019,130,103951.
doi: 10.1016/j.ijmedinf.2019.08.011
3 YANGY F,ZHANGX F,LEEP K C.Improving the effectiveness of online healthcare platforms: an empirical study with multi-period patient-doctor consultation data[J].International Journal of Production Economics,2019,207,70-80.
doi: 10.1016/j.ijpe.2018.11.009
4 袁野,廖薇.基于双通道神经网络的疾病文本分类方法[J].中国医学物理学杂志,2021,38(5):655-660.
doi: 10.3969/j.issn.1005-202X.2021.05.025
YUANYe,LIAOWei.Disease text classification model based on two-channel neural network[J].Chinese Journal of Medical Physics,2021,38(5):655-660.
doi: 10.3969/j.issn.1005-202X.2021.05.025
5 MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07)[2023-01-30]. http://arxiv.org/abs/1301.3781.
6 KIM Y. Convolutional neural networks for sentence classification[EB/OL]. (2014-09-03)[2023-01-30]. https://arxiv.org/abs/1408.5882.
7 LIU Pengfei, QIU Xipeng, HUANG Xuanjing. Recurrent neural network for text classification with multi-task learning[EB/OL]. (2016-05-17)[2023-01-30]. https://arxiv.org/abs/1605.05101.
8 ZHOU Peng, SHI Wei, TIAN Jun, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2016: 207-212.
9 李启行,廖薇.基于注意力机制的生物医学文本分类模型[J].中国医学物理学杂志,2022,39(4):518-523.
doi: 10.3969/j.issn.1005-202X.2022.04.023
LIQixing,LIAOWei.Biomedical text classification model based on attention mechanism[J].Chinese Journal of Medical Physics,2022,39(4):518-523.
doi: 10.3969/j.issn.1005-202X.2022.04.023
10 邓维斌,朱坤,李云波,等.FMNN: 融合多神经网络的文本分类模型[J].计算机科学,2022,49(3):281-287.
DENGWeibin,ZHUKun,LIYunbo,et al.FMNN: text classification model fused with multiple neural networks[J].Computer Science,2022,49(3):281-287.
11 邓露,胡珀,李炫宏.知识增强的生物医学文本生成式摘要研究[J].数据分析与知识发现,2022,6(11):1-12.
doi: 10.11925/infotech.2096-3467.2022.0034
DENGLu,HUPo,LIXuanhong.Abstracting biomedical documents with knowledge enhancement[J].Data Analysis and Knowledge Discovery,2022,6(11):1-12.
doi: 10.11925/infotech.2096-3467.2022.0034
12 ZHOU Chengyang, GUAN Renchu, ZHAO Chuntao, et al. A Chinese medical question answering system based on knowledge graph[C]//2021 IEEE 15th International Conference on Big Data Science and Engineering. Shenyang: IEEE, 2021: 28-33.
13 侯梦薇,卫荣,陆亮,等.知识图谱研究综述及其在医疗领域的应用[J].计算机研究与发展,2018,55(12):2587-2599.
doi: 10.7544/issn1000-1239.2018.20180623
HOUMengwei,WEIRong,LULiang,et al.Research review of knowledge graph and its application in medical domain[J].Journal of Computer Research and Development,2018,55(12):2587-2599.
doi: 10.7544/issn1000-1239.2018.20180623
14 WANG Jin, WANG Zhongyuan, ZHANG Dawei, et al. Combining knowledge with deep convolutional neural networks for short text classification[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne: ACM, 2017: 2915-2921.
15 ALAGHAI.Leveraging knowledge-based features with multilevel attention mechanisms for short Arabic text classification[J].IEEE Access,2022,10,51908-51921.
doi: 10.1109/ACCESS.2022.3175306
16 李博涵,向宇轩,封顶,等.融合知识感知与双重注意力的短文本分类模型[J].软件学报,2022,33(10):3565-3581.
LIBohan,XIANGYuxuan,FENGDing,et al.Short text classification model combining knowledge aware and dual attention[J].Journal of Software,2022,33(10):3565-3581.
17 HAN Xu, CAO Shulin, LV Xin, et al. OpenKE: an open toolkit for knowledge embedding[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Brussels: Association for Computational Linguistics, 2018: 139-144.
18 WANG Hongwei, ZHANG Fuzheng, XIE Xing, et al. DKN: deep knowledge-aware network for news recommendation[EB/OL]. (2018-01-30)[2023-01-30]. http://arxiv.org/abs/1801.08284.
19 ALSHUBAILY I. TextCNN with attention for text classification[EB/OL]. (2019-10-15)[2023-01-30]. https://arxiv.org/abs/2108.01921.
20 JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[EB/OL]. (2016-08-09)[2023-01-30]. https://arxiv.org/abs/1607.01759v1.
21 LAISiwei,XULiheng,LIUKang,et al.Recurrent convolutional neural networks for text classification[J].Proceedings of the AAAI Conference on Artificial Intelligence,2015,29(1):2267-2273.
[1] 桂梁,徐遥,何世柱,张元哲,刘康,赵军. 基于动态邻居选择的知识图谱事实错误检测方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 76-84.
[2] 孙承杰,李宗蔚,单丽莉,林磊. 一种基于核心论元的篇章级事件抽取方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 53-63.
[3] 王静红,吴芝冰,黄鹏,杨家腾,李笔. 基于元路径属性融合的异质网络表示学习[J]. 《山东大学学报(理学版)》, 2024, 59(3): 1-13.
[4] 牛泽群,李晓戈,强成宇,韩伟,姚怡,刘洋. 基于图注意力神经网络的实体消歧方法[J]. 《山东大学学报(理学版)》, 2024, 59(3): 71-80, 94.
[5] 那宇嘉,谢珺,杨海洋,续欣莹. 融合上下文的知识图谱补全方法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 71-80.
[6] 卢婵,郭军军,谭凯文,相艳,余正涛. 基于文本指导的层级自适应融合的多模态情感分析[J]. 《山东大学学报(理学版)》, 2023, 58(12): 31-40, 51.
[7] 王静红,梁丽娜,李昊康,王熙照. 基于标记注意力机制的社区发现算法[J]. 《山东大学学报(理学版)》, 2022, 57(12): 1-12.
[8] 鲍亮,陈志豪,陈文章,叶锴,廖祥文. 基于双重多路注意力匹配的观点型阅读理解[J]. 《山东大学学报(理学版)》, 2021, 56(3): 44-53.
[9] 唐光远,郭军军,余正涛,张亚飞,高盛祥. 基于BERT与法条知识驱动的法条推荐方法[J]. 《山东大学学报(理学版)》, 2021, 56(11): 24-30.
[10] 阴爱英,林建洲,吴运兵,廖祥文. 融合图卷积神经网络的文本情感分类[J]. 《山东大学学报(理学版)》, 2021, 56(11): 15-23.
[11] 银温社,贺建峰. 基于深度学习的眼底图像出血点检测方法[J]. 《山东大学学报(理学版)》, 2020, 55(9): 62-71.
[12] 郝长盈,兰艳艳,张海楠,郭嘉丰,徐君,庞亮,程学旗. 基于拓展关键词信息的对话生成模型[J]. 《山东大学学报(理学版)》, 2019, 54(7): 68-76.
[13] 王文卿,撖奥洋,于立涛,张智晟. 自编码器与PSOA-CNN结合的短期负荷预测模型[J]. 《山东大学学报(理学版)》, 2019, 54(7): 50-56.
[14] 张芳芳,曹兴超. 基于字面和语义相关性匹配的智能篇章排序[J]. 山东大学学报(理学版), 2018, 53(3): 46-53.
[15] 秦静,林鸿飞,徐博. 基于示例语义的音乐检索模型[J]. 山东大学学报(理学版), 2017, 52(6): 40-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 郭乔进,丁轶,李宁. 一种基于上下文信息的乳腺肿块ROI检测方法[J]. J4, 2010, 45(7): 70 -75 .
[2] 付海艳,卢昌荆,史开泉 . (F,F-)-规律推理与规律挖掘[J]. J4, 2007, 42(7): 54 -57 .
[3] 刘洪华 . 色散方程的交替分组迭代方法[J]. J4, 2007, 42(1): 19 -23 .
[4] 刘昆仑. 变结构pair copula模型在金融危机传染分析中的应用[J]. 山东大学学报(理学版), 2016, 51(6): 104 -110 .
[5] 汤晓宏1,胡文效2*,魏彦锋2,蒋锡龙2,张晶莹2,. 葡萄酒野生酿酒酵母的筛选及其生物特性的研究[J]. 山东大学学报(理学版), 2014, 49(03): 12 -17 .
[6] 袁瑞强,刘贯群,张贤良,高会旺 . 黄河三角洲浅层地下水中氢氧同位素的特征[J]. J4, 2006, 41(5): 138 -143 .
[7] 何海伦, 陈秀兰*. 变性剂和缓冲系统对适冷蛋白酶MCP-01和中温蛋白酶BP-01构象影响的圆二色光谱分析何海伦, 陈秀兰*[J]. 山东大学学报(理学版), 2013, 48(1): 23 -29 .
[8] 王碧玉,曹小红*. 算子矩阵的Browder定理的摄动[J]. 山东大学学报(理学版), 2014, 49(03): 90 -95 .
[9] 胡选子1, 谢存禧2. 基于人工免疫网络的机器人局部路径规划[J]. J4, 2010, 45(7): 122 -126 .
[10] 郭文鹃,杨公平*,董晋利. 指纹图像分割方法综述[J]. J4, 2010, 45(7): 94 -101 .