JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (7): 122-130.doi: 10.6040/j.issn.1671-9352.0.2023.291

• Review • Previous Articles    

Chinese disease text classification model driven by medical knowledge

Chao LI(),Wei LIAO*()   

  1. School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
  • Received:2023-06-30 Online:2024-07-20 Published:2024-07-15
  • Contact: Wei LIAO;


This study proposes a Chinese disease text classification model that integrates knowledge graph. Firstly, by introducing structured knowledge from external medical knowledge graph, a knowledge enhanced disease text vector representation is obtained; Secondly, the global semantic features and local semantic features of the disease text are extracted by using bidirectional long short-term memory network and convolutional neural network respectively. At the same time, the joint attention mechanism improves the efficiency of the model in extracting effective features information; Finally, the extracted features are concatenated and fused, and a classifier is used to output the classification result. The experimental results on the Chinese disease text dataset show that the proposed model has a classification accuracy, recall, and the harmonic mean value F1 of 95.21%, 95.64%, and 95.42%, respectively, which shows better classification performance compared to other models.

Key words: disease text classification, knowledge graph, CNN, BiLSTM, attention mechanism

CLC Number: 

  • TP391


DKCDM model"

Table 1

Disease text and corresponding department"

疾病文本 科室
医生您好,乙肝表面抗原阴性,谷丙转氨酶169谷草转氨酶87正常吗? 肝病科
  我周围有很多认识的人得这种病,有的人把甲状腺切除了,有的人症状比较轻,但是人也变得消瘦了。引起这种病的原因是什么,治疗方法是什么? 内分泌科
前几天运动场有人跌倒后突然癫痫,全身发颤,请问癫痫是怎么造成的? 神经科


LSTM unit structure"


BiLSTM model structure"

Table 2

Confusion matrix"

预测标签 Positive Negative
Positive TP FP
Negative FN TN


Confusing matrix of classification results"


Comparison of F1values between different models"

Table 3

Models test experimental results  单位: %"

模型 P R F1
SVM 90.32 90.64 90.48
TextCNN 92.17 92.43 92.30
TextRNN 92.25 92.68 92.46
FastText 93.72 93.54 93.47
TextRCNN 93.47 93.85 93.66
DKCDM 95.21 95.64 95.42

Table 4

Ablation experiment  单位: %"

模型 P R F1
Remove KG 93.62 93.96 93.79
Remove TransE 95.18 95.27 95.22
RemoveBiLSTM_Attention 94.76 94.12 94.44
Remove CNN_Attention 94.17 94.51 94.34
DKCDM 95.21 95.64 95.42
1 MA Y W, CHEN J L, SHIH W K. The survey for next generation mobile networks framework applied to intelligent Internet of medical[C]//2021 IEEE International Conference on Smart Internet of Things. Jeju: IEEE, 2021: 267-270.
2 LIYufei,SONGYuanyuan,ZHAOWei,et al.Exploring the role of online health community information in patients' decisions to switch from online to offline medical services[J].International Journal of Medical Informatics,2019,130,103951.
doi: 10.1016/j.ijmedinf.2019.08.011
3 YANGY F,ZHANGX F,LEEP K C.Improving the effectiveness of online healthcare platforms: an empirical study with multi-period patient-doctor consultation data[J].International Journal of Production Economics,2019,207,70-80.
doi: 10.1016/j.ijpe.2018.11.009
4 袁野,廖薇.基于双通道神经网络的疾病文本分类方法[J].中国医学物理学杂志,2021,38(5):655-660.
doi: 10.3969/j.issn.1005-202X.2021.05.025
YUANYe,LIAOWei.Disease text classification model based on two-channel neural network[J].Chinese Journal of Medical Physics,2021,38(5):655-660.
doi: 10.3969/j.issn.1005-202X.2021.05.025
5 MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL]. (2013-09-07)[2023-01-30].
6 KIM Y. Convolutional neural networks for sentence classification[EB/OL]. (2014-09-03)[2023-01-30].
7 LIU Pengfei, QIU Xipeng, HUANG Xuanjing. Recurrent neural network for text classification with multi-task learning[EB/OL]. (2016-05-17)[2023-01-30].
8 ZHOU Peng, SHI Wei, TIAN Jun, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2016: 207-212.
9 李启行,廖薇.基于注意力机制的生物医学文本分类模型[J].中国医学物理学杂志,2022,39(4):518-523.
doi: 10.3969/j.issn.1005-202X.2022.04.023
LIQixing,LIAOWei.Biomedical text classification model based on attention mechanism[J].Chinese Journal of Medical Physics,2022,39(4):518-523.
doi: 10.3969/j.issn.1005-202X.2022.04.023
10 邓维斌,朱坤,李云波,等.FMNN: 融合多神经网络的文本分类模型[J].计算机科学,2022,49(3):281-287.
DENGWeibin,ZHUKun,LIYunbo,et al.FMNN: text classification model fused with multiple neural networks[J].Computer Science,2022,49(3):281-287.
11 邓露,胡珀,李炫宏.知识增强的生物医学文本生成式摘要研究[J].数据分析与知识发现,2022,6(11):1-12.
doi: 10.11925/infotech.2096-3467.2022.0034
DENGLu,HUPo,LIXuanhong.Abstracting biomedical documents with knowledge enhancement[J].Data Analysis and Knowledge Discovery,2022,6(11):1-12.
doi: 10.11925/infotech.2096-3467.2022.0034
12 ZHOU Chengyang, GUAN Renchu, ZHAO Chuntao, et al. A Chinese medical question answering system based on knowledge graph[C]//2021 IEEE 15th International Conference on Big Data Science and Engineering. Shenyang: IEEE, 2021: 28-33.
13 侯梦薇,卫荣,陆亮,等.知识图谱研究综述及其在医疗领域的应用[J].计算机研究与发展,2018,55(12):2587-2599.
doi: 10.7544/issn1000-1239.2018.20180623
HOUMengwei,WEIRong,LULiang,et al.Research review of knowledge graph and its application in medical domain[J].Journal of Computer Research and Development,2018,55(12):2587-2599.
doi: 10.7544/issn1000-1239.2018.20180623
14 WANG Jin, WANG Zhongyuan, ZHANG Dawei, et al. Combining knowledge with deep convolutional neural networks for short text classification[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne: ACM, 2017: 2915-2921.
15 ALAGHAI.Leveraging knowledge-based features with multilevel attention mechanisms for short Arabic text classification[J].IEEE Access,2022,10,51908-51921.
doi: 10.1109/ACCESS.2022.3175306
16 李博涵,向宇轩,封顶,等.融合知识感知与双重注意力的短文本分类模型[J].软件学报,2022,33(10):3565-3581.
LIBohan,XIANGYuxuan,FENGDing,et al.Short text classification model combining knowledge aware and dual attention[J].Journal of Software,2022,33(10):3565-3581.
17 HAN Xu, CAO Shulin, LV Xin, et al. OpenKE: an open toolkit for knowledge embedding[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Brussels: Association for Computational Linguistics, 2018: 139-144.
18 WANG Hongwei, ZHANG Fuzheng, XIE Xing, et al. DKN: deep knowledge-aware network for news recommendation[EB/OL]. (2018-01-30)[2023-01-30].
19 ALSHUBAILY I. TextCNN with attention for text classification[EB/OL]. (2019-10-15)[2023-01-30].
20 JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[EB/OL]. (2016-08-09)[2023-01-30].
21 LAISiwei,XULiheng,LIUKang,et al.Recurrent convolutional neural networks for text classification[J].Proceedings of the AAAI Conference on Artificial Intelligence,2015,29(1):2267-2273.
[1] Liang GUI,Yao XU,Shizhu HE,Yuanzhe ZHANG,Kang LIU,Jun ZHAO. Factual error detection in knowledge graphs based on dynamic neighbor selection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 76-84.
[2] Jinghong WANG,Zhibing WU,Peng HUANG,Jiateng YANG,Bi LI. Heterogeneous network representation learning based on metapath attribute fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 1-13.
[3] Zequn NIU,Xiaoge LI,Chengyu QIANG,Wei HAN,Yi YAO,Yang LIU. Entity disambiguation method based on graph attention networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 71-80, 94.
[4] Yujia NA,Jun XIE,Haiyang YANG,Xinying XU. Context fusion-based knowledge graph completion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(9): 71-80.
[5] MENG Jinxu, SHAN Hongtao, HUANG Runcai, YAN Fengting, LI Zhiwei, ZHENG Guangyuan, LIU Yiming, SHI Changtong. Text classification model based on dual-channel feature fusion based on XLNet [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(5): 36-45.
[6] Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.
[7] ZHENG Cheng-yu, WANG Xin, WANG Ting, DENG Ya-ping, YIN Tian-tian. Multi-label classification for medical text based on ALBERT-TextCNN model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(4): 21-29.
[8] WANG Jing-hong, LIANG Li-na, LI Hao-kang, WANG Xi-zhao. Community discovery algorithm based on label attention mechanism [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 1-12.
[9] BAO Liang, CHEN Zhi-hao, CHEN Wen-zhang, YE Kai, LIAO Xiang-wen. Dual co-matching network with multiway attention for opinion reading comprehension [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(3): 44-53.
[10] TANG Guang-yuan, GUO Jun-jun, YU Zheng-tao, ZHANG Ya-fei,GAO Sheng-xiang. Method of recommendation based on knowledge driven by BERT and law [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(11): 24-30.
[11] Ni LI,Huan-mei GUAN,Piao YANG,Wen-yong DONG. BERT-IDCNN-CRF for named entity recognition in Chinese [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(1): 102-109.
[12] LIU Yang, ZHAO Ke-jun, GE Lian-sheng, LIU Heng. A fast DGA domain detection algorithm based on deep learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 106-112.
[13] Chang-ying HAO,Yan-yan LAN,Hai-nan ZHANG,Jia-feng GUO,Jun XU,Liang PANG,Xue-qi CHENG. Dialogue generation model based on extended keywords information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 68-76.
[14] SUN Jian-dong, GU Xiu-sen, LI Yan, XU Wei-ran. Chinese entity relation extraction algorithms based on COAE2016 datasets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 7-12.
[15] YANG Yan, XU Bing, YANG Mu-yun, ZHAO Jing-jing. An emotional classification method based on joint deep learning model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 19-25.
Full text



[1] GUO Qiao-jin, DING Yi, LI Ning. A context based method for ROI detection in digitized mammograms[J]. J4, 2010, 45(7): 70 -75 .
[2] FU Hai-yan,LU Chang-jing,SHI Kai-quan . (F,F-)-law inference and law mining[J]. J4, 2007, 42(7): 54 -57 .
[3] LIU Hong-hua . The alternating group iterative method for the dispersive equation[J]. J4, 2007, 42(1): 19 -23 .
[4] LIU Kun-lun. Application of variable structure pair copula model in the analysis of financial contagion[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(6): 104 -110 .
[5] TANG Xiao-hong1, HU Wen-xiao2*, WEI Yan-feng2, JIANG Xi-long2, ZHANG Jing-ying2, SHAO Xue-dong3. Screening and biological characteristics studies of wide wine-making yeasts[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 12 -17 .
[6] YUAN Rui-qiang,LIU Guan-qun,ZHANG Xian-liang,GAO Hui-wang . Features of hydrogen and oxygen isotopes in groundwater ofthe shallow part of Yellow River Delta[J]. J4, 2006, 41(5): 138 -143 .
[7] HE Hai-lun, CHEN Xiu-lan* . Circular dichroism detection of the effects of denaturants and buffers on the conformation of cold-adapted protease MCP-01 and  mesophilic protease BP01[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 23 -29 .
[8] WANG Bi-yu, CAO Xiao-hong*. The perturbation for the Browder’s theorem of operator matrix#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(03): 90 -95 .
[9] HU Xuan-zi1, XIE Cun-xi2. A robot local path plan based on artificial immune network[J]. J4, 2010, 45(7): 122 -126 .
[10] GUO Wen-juan, YANG Gong-ping*, DONG Jin-li. A review of fingerprint image segmentation methods[J]. J4, 2010, 45(7): 94 -101 .