JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2020, Vol. 55 ›› Issue (1): 102-109.doi: 10.6040/j.issn.1671-9352.2.2019.076

•   • Previous Articles     Next Articles

BERT-IDCNN-CRF for named entity recognition in Chinese

Ni LI1(),Huan-mei GUAN2,*(),Piao YANG2,Wen-yong DONG2   

  1. 1. State Key Laboratory of Power Grid Environmental Protection, China Electric Power Research Institute, Wuhan 430074, Hubei, China
    2. School of Computer Science, Wuhan University, Wuhan 430072, Hubei, China
  • Received:2019-09-02 Online:2020-01-20 Published:2020-01-10
  • Contact: Huan-mei GUAN;


The pre-trained language model, BERT (bidirectional encoder representations from transformers), has shown promising result in NER (named entity recognition) due to its ability to represent rich syntactic, grammatical information in sentences and the polysemy of words. However, most existing BERT fine-tuning based models need to update lots of model parameters, facing with expensive time cost at both training and testing phases. To handle this problem, this work presents a novel BERT based language model for Chinese NER, named BERT-IDCNN-CRF (BERT-iterated dilated convolutional neural network-conditional random field). The proposed model utilizes traditional BERT model to obtain the context representation of the word as the input of IDCNN-CRF. At training phase, the model parameters of BERT in the proposed model remain unchanged so that the proposed model can reduce parameters training while maintaining polysemy of words. Experimental results show that the proposed model obtains significant training time with acceptable test error.

Key words: NER in Chinese, BERT, IDCNN, CRF, information security

CLC Number: 

  • TP391


The proposed BERT-IDCNN-CRF model"


BERT pre-trained language model"


Transformer coding unit"


Model training process"


Dilated convolution diagram"

Table 1

Number of entities statistics"

数据集 地名 机构名 人名 共计
训练集 36 517 20 571 17 615 74 703
测试集 2 877 1 331 1 973 6 181

Table 2

Experimental setting"

操作系统 Ubuntu
CPU i7-6700HQ@2.60GHz
GPU GTX 1070 (8 GB)
Python 3.6
Tensorflow 1.12.0
内存 32G


Variation of F1 value in BERT-IDCNN-CRF model"


Experimental results of stacking layers of different dilated convolution blocks"

Table 3

Recognition results for different types of named entities"

Models Type P R F1
BERT-IDCNN-CRF LOC 96.32 93.81 95.05
ORG 88.86 91.06 89.94
PER 96.95 96.16 96.55
ALL 94.86 93.97 94.41

Table 4

Examples of prediction errors"

句子 中国政府陪同团
例句1 实体 中国政府陪同团-ORG
预测实体 中国-LOC
句子 委员会的安全任务更加繁重了
例句2 实体 委员会-ORG


BERT-fine-tuning experimental results"

Table 5

Named entity recognition results for different models"

Models P R F1 Time(ep)
BiLSTM-CRF 88.80 87.16 87.97 416
IDCNN-CRF 89.39 84.64 86.95 209
Radical-BiLSTM-CRF 91.28 90.62 90.95 >410
Lattice-LSTM-CRF 93.57 92.79 93.18 7 506
BERT-fine-tuning 94.09 94.54 95.37 1 363
BERT-IDCNN-CRF 94.86 93.97 94.41 216
1 HAMMERTON J. Named entity recognition with long short-term memory[C]// Conference on Natural Language Learning at HLT-NAACL. NJ: Association for Computational Linguistics, 2003.
2 LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[J/OL]. arXiv: 1603.01360[cs]. 2016.
3 MA X, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[J/OL]. arXiv: 1603.01354[cs]. 2016.
4 CHIU J P C , NICHOLS E . Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016, (4): 357- 370.
5 DONG C H , ZHANG J J , ZONG C Q , et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[M]. Cham: Springer, 2016: 239- 250.
6 HE J, WANG H. Chinese named entity recognition and word segmentation based on character[C]// Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing.[S.l.]: [s.n.], 2008.
7 LIU Z X, ZHU C H, ZHAO T J. Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words?[M]//Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. Berlin: Springer, 2010: 634-640.
8 LI H, HAGIWARA M, LI Q, et al. Comparison of the impact of word segmentation on name tagging for Chinese and Japanese[C]// LREC.[S.l.]: [s.n.], 2014: 2532-2536.
9 CHEN W, ZHANG Y, ISAHARA H. Chinese named entity recognition with conditional random fields[C] // Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing.[S.l.]: [s.n.], 2006: 118-121.
10 LU Y, ZHANG Y, and JI D. Multi-prototype Chinese character embedding[C]// LREC, Berlin: Springer, 2016.
11 ZHOU J S , QU W G , ZHANG F . Chinese named entity recognition via joint identification and categorization[J]. Eleetron, 2013, (22): 225- 230.
12 ZHAO H, KIT C. Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recognition[C]// Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing. Berlin: Springer, 2008.
13 PENG N, DREDZE M. Named entity recognition for Chinese social media with jointly trained embeddings[C] // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. PA: Association for Computational Linguistics, 2015: 548-554.
14 HE H, SUN X. F-Score driven max margin neural network for named entity recognition in Chinese social media[J/OL]. arXiv: 1611.04234[cs], 2016.
15 ZHANG Y, YANG J. Chinese NER using lattice LSTM[J/OL]. arXiv: 1805.02023[cs], 2018.
16 COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J/OL]. arXiv: 1103.0398[cs]. 2011.
17 STRUBELL E, VERGA P, Belanger D, et al. Fast and accurate entity recognition with iterated dilated convolutions[J/OL]. arXiv: 1702.02098[cs], 2017.
18 REI M. Semi-supervised multitask learning for sequence labeling[J/OL]. arXiv: 1704.07156[cs], 2017.
19 DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J/OL]. arXiv: 1810.04805[cs], 2018.
20 YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J/OL]. arXiv: 1511.07122[cs], 2015.
[1] CAO Hui-rong , ZHOU Wei, CHU Tong, ZHOU Jie. Dynamic analysis of Bertrand game model about taxation of government and subsidy [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(11): 52-62.
[2] ZHANG Ke-yong, LI Jiang-xin, YAO Jian-ming, LI Chun-xia. Research on supply chain decision making of equitable retailers with fair sensitivities [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(9): 83-94.
[3] DING Yi-tao, YANG Hai-bin, YANG Xiao-yuan, ZHOU Tan-ping. A reversible image data hiding scheme in Homomorphic encrypted domain [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(7): 104-110.
[4] KANG Hai-yan, MA Yue-lei. Survey on application of data mining via differential privacy [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 16-23.
[5] DIAO Qun, SHI Dong-yang. New H 1-Galerkin mixed finite element analysis for quasi-linear viscoelasticity equation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(4): 90-98.
[6] SUN He, LI Shu-qin, L(¨overU)Xue-qiang, LIU Ke-hui. Recognition of geographical entity in city complaints of Micro-blog [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(3): 77-85.
[7] WU Zhi-jun,SHEN Dan-dan. Architecture and key technologies of network-enabled next generation global flight tracking based on information integration and sharing [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 1-6.
[8] HE Yan-xiang, LIU Jian-bo, SUN Song-tao, WEN Wei-dong. Product reviews sentiment classification in Micro-blog based on cascaded conditional random field [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(11): 67-73.
[9] ZHANG Jing, XUE Leng, CUI Yi, RONG Hui, WANG Jian-ping. Modeling and evaluation of a dual chaotic encryption algorithm for WSN [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(03): 1-5.
[10] SUN Song-tao, HE Yan-xiang, CAI Rui, LI Fei, HE Fei-yan. Comparative study of methods for Micro-blog sentiment evaluation tasks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 43-50.
[11] YANG Gong-lin, JI Pei-sheng. Some properties of primitive ideal submodules in Hilbert C*-modules [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(10): 50-55.
[12] KANG Hai-yan, YANG Kong-yu, CHEN Jian-ming. A method of personalized privacy preservation based on K-anonymization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(09): 142-149.
[13] LIU Ni. On (P,Q) outer generalized inverse in Hilbert space#br# [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 90-94.
[14] DONG Fang-fang. The disjointness and invariant problems of frames on Hilbert K-modules [J]. J4, 2012, 47(4): 33-36.
[15] TIAN Jun-hong1, WANG Gai-ling2, CAO Xiao-hong2. [J]. J4, 2012, 47(4): 28-32.
Full text



[1] YANG Lun, XU Zheng-gang, WANG Hui*, CHEN Qi-mei, CHEN Wei, HU Yan-xia, SHI Yuan, ZHU Hong-lei, ZENG Yong-qing*. Silence of PID1 gene expression using RNA interference in C2C12 cell line[J]. J4, 2013, 48(1): 36 -42 .
[2] LIU Yan-ping, WU Qun-ying. Almost sure limit theorems for the maximum of Gaussian sequences#br# with optimized weight[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 50 -53 .
[3] ZHANG Shen-gui. Multiplicity of solutions for local superlinear p-kirchhoff-type equation#br#[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(05): 61 -68 .
[4] QIN Zhao-yu,LIU Shi-lian*,YANG Yin-rong,LIU Fu-jun,LI Jian-yuan,SONG Chun-hua . Technology exploration for proteomics analysis in hepatopancreas of shrimp (Fenneropenaeus chinensis) with white spot syndrome[J]. J4, 2007, 42(7): 5 -08 .
[5] ZHANG Dong-qing, YIN Xiao-bin, GAO Han-peng. Quasi-linearly Armendariz modules[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 1 -6 .
[6] DIAO Ke-feng and ZHAO Ping . On the coloring of C-hypergraphs with minimum connected pair graphs[J]. J4, 2007, 42(2): 56 -58 .
[7] YANG Jun. Characterization and structural control of metalbased nanomaterials[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 1 -22 .
[8] LUO Si-te, LU Li-qian, CUI Ruo-fei, ZHOU Wei-wei, LI Zeng-yong*. Monte-Carlo simulation of photons transmission at alcohol wavelength in  skin tissue and design of fiber optic probe[J]. J4, 2013, 48(1): 46 -50 .
[9] DONG Wei-wei. A new method of DEA efficiency ranking for decision making units with independent subsystems[J]. J4, 2013, 48(1): 89 -92 .
[10] PEI Sheng-yu,ZHOU Yong-quan. A mult-objective particle swarm optimization algorithm based on  the  chaotic mutation[J]. J4, 2010, 45(7): 18 -23 .