您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2020, Vol. 55 ›› Issue (1): 102-109.doi: 10.6040/j.issn.1671-9352.2.2019.076

•   • 上一篇    下一篇

基于BERT-IDCNN-CRF的中文命名实体识别方法

李妮1(),关焕梅2,*(),杨飘2,董文永2   

  1. 1. 中国电力科学研究院有限公司电网环境保护国家重点实验室,湖北 武汉 430074
    2. 武汉大学计算机学院,湖北 武汉 430072
  • 收稿日期:2019-09-02 出版日期:2020-01-20 发布日期:2020-01-10
  • 通讯作者: 关焕梅 E-mail:lini@epri.sgcc.com.cn;hmguan@whu.edu.cn
  • 作者简介:李妮(1982—),女,硕士,高级工程师,研究方向为电力系统电磁环境和电磁兼容. E-mail: lini@epri.sgcc.com.cn
  • 基金资助:
    国家电网公司总部科技项目(GY71-18-009)

BERT-IDCNN-CRF for named entity recognition in Chinese

Ni LI1(),Huan-mei GUAN2,*(),Piao YANG2,Wen-yong DONG2   

  1. 1. State Key Laboratory of Power Grid Environmental Protection, China Electric Power Research Institute, Wuhan 430074, Hubei, China
    2. School of Computer Science, Wuhan University, Wuhan 430072, Hubei, China
  • Received:2019-09-02 Online:2020-01-20 Published:2020-01-10
  • Contact: Huan-mei GUAN E-mail:lini@epri.sgcc.com.cn;hmguan@whu.edu.cn

摘要:

预训练语言模型能够表达句子丰富的句法和语法信息,并且能够对词的多义性建模,在自然语言处理中有着广泛的应用,BERT(bidirectional encoder representations from transformers)预训练语言模型是其中之一。在基于BERT微调的命名实体识别方法中,存在的问题是训练参数过多,训练时间过长。针对这个问题提出了基于BERT-IDCNN-CRF(BERT-iterated dilated convolutional neural network-conditional random field)的中文命名实体识别方法,该方法通过BERT预训练语言模型得到字的上下文表示,再将字向量序列输入IDCNN-CRF模型中进行训练,训练过程中保持BERT参数不变,只训练IDCNN-CRF部分,在保持多义性的同时减少了训练参数。实验表明,该模型在MSRA语料上F1值能够达到94.41%,在中文命名实体任务上优于目前最好的Lattice-LSTM模型,提高了1.23%;与基于BERT微调的方法相比,该方法的F1值略低但是训练时间大幅度缩短。将该模型应用于信息安全、电网电磁环境舆情等领域的敏感实体识别,速度更快,响应更及时。

关键词: 中文命名实体识别, BERT模型, 膨胀卷积, 条件随机场, 信息安全

Abstract:

The pre-trained language model, BERT (bidirectional encoder representations from transformers), has shown promising result in NER (named entity recognition) due to its ability to represent rich syntactic, grammatical information in sentences and the polysemy of words. However, most existing BERT fine-tuning based models need to update lots of model parameters, facing with expensive time cost at both training and testing phases. To handle this problem, this work presents a novel BERT based language model for Chinese NER, named BERT-IDCNN-CRF (BERT-iterated dilated convolutional neural network-conditional random field). The proposed model utilizes traditional BERT model to obtain the context representation of the word as the input of IDCNN-CRF. At training phase, the model parameters of BERT in the proposed model remain unchanged so that the proposed model can reduce parameters training while maintaining polysemy of words. Experimental results show that the proposed model obtains significant training time with acceptable test error.

Key words: NER in Chinese, BERT, IDCNN, CRF, information security

中图分类号: 

  • TP391

图1

BERT-IDCNN-CRF模型图"

图2

BERT预训练语言模型"

图3

Transformer编码单元"

图4

模型训练过程"

图5

膨胀卷积示意图"

表1

实体个数统计"

数据集 地名 机构名 人名 共计
训练集 36 517 20 571 17 615 74 703
测试集 2 877 1 331 1 973 6 181

表2

实验环境"

操作系统 Ubuntu
CPU i7-6700HQ@2.60GHz
GPU GTX 1070 (8 GB)
Python 3.6
Tensorflow 1.12.0
内存 32G

图6

BERT-IDCNN-CRF模型F1值变化图"

图7

不同膨胀卷积块堆叠层数实验结果"

表3

不同类型命名实体识别结果"

Models Type P R F1
BERT-IDCNN-CRF LOC 96.32 93.81 95.05
ORG 88.86 91.06 89.94
PER 96.95 96.16 96.55
ALL 94.86 93.97 94.41

表4

预测错误实例"

句子 中国政府陪同团
例句1 实体 中国政府陪同团-ORG
预测实体 中国-LOC
句子 委员会的安全任务更加繁重了
例句2 实体 委员会-ORG
预测实体

图8

BERT-fine-tuning实验结果"

表5

不同模型命名实体识别结果"

Models P R F1 Time(ep)
/s
BiLSTM-CRF 88.80 87.16 87.97 416
IDCNN-CRF 89.39 84.64 86.95 209
Radical-BiLSTM-CRF 91.28 90.62 90.95 >410
Lattice-LSTM-CRF 93.57 92.79 93.18 7 506
BERT-fine-tuning 94.09 94.54 95.37 1 363
BERT-IDCNN-CRF 94.86 93.97 94.41 216
1 HAMMERTON J. Named entity recognition with long short-term memory[C]// Conference on Natural Language Learning at HLT-NAACL. NJ: Association for Computational Linguistics, 2003.
2 LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[J/OL]. arXiv: 1603.01360[cs]. 2016.
3 MA X, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[J/OL]. arXiv: 1603.01354[cs]. 2016.
4 CHIU J P C , NICHOLS E . Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016, (4): 357- 370.
5 DONG C H , ZHANG J J , ZONG C Q , et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[M]. Cham: Springer, 2016: 239- 250.
6 HE J, WANG H. Chinese named entity recognition and word segmentation based on character[C]// Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing.[S.l.]: [s.n.], 2008.
7 LIU Z X, ZHU C H, ZHAO T J. Chinese named entity recognition with a sequence labeling approach: based on characters, or based on words?[M]//Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. Berlin: Springer, 2010: 634-640.
8 LI H, HAGIWARA M, LI Q, et al. Comparison of the impact of word segmentation on name tagging for Chinese and Japanese[C]// LREC.[S.l.]: [s.n.], 2014: 2532-2536.
9 CHEN W, ZHANG Y, ISAHARA H. Chinese named entity recognition with conditional random fields[C] // Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing.[S.l.]: [s.n.], 2006: 118-121.
10 LU Y, ZHANG Y, and JI D. Multi-prototype Chinese character embedding[C]// LREC, Berlin: Springer, 2016.
11 ZHOU J S , QU W G , ZHANG F . Chinese named entity recognition via joint identification and categorization[J]. Eleetron, 2013, (22): 225- 230.
12 ZHAO H, KIT C. Unsupervised segmentation helps supervised learning of character tagging for word segmentation and named entity recognition[C]// Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing. Berlin: Springer, 2008.
13 PENG N, DREDZE M. Named entity recognition for Chinese social media with jointly trained embeddings[C] // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. PA: Association for Computational Linguistics, 2015: 548-554.
14 HE H, SUN X. F-Score driven max margin neural network for named entity recognition in Chinese social media[J/OL]. arXiv: 1611.04234[cs], 2016.
15 ZHANG Y, YANG J. Chinese NER using lattice LSTM[J/OL]. arXiv: 1805.02023[cs], 2018.
16 COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J/OL]. arXiv: 1103.0398[cs]. 2011.
17 STRUBELL E, VERGA P, Belanger D, et al. Fast and accurate entity recognition with iterated dilated convolutions[J/OL]. arXiv: 1702.02098[cs], 2017.
18 REI M. Semi-supervised multitask learning for sequence labeling[J/OL]. arXiv: 1704.07156[cs], 2017.
19 DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J/OL]. arXiv: 1810.04805[cs], 2018.
20 YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J/OL]. arXiv: 1511.07122[cs], 2015.
[1] 丁义涛,杨海滨,杨晓元,周潭平. 一种同态密文域可逆隐藏方案[J]. 山东大学学报(理学版), 2017, 52(7): 104-110.
[2] 康海燕,马跃雷. 差分隐私保护在数据挖掘中应用综述[J]. 山东大学学报(理学版), 2017, 52(3): 16-23.
[3] 吴志军,沈丹丹. 基于信息综合集成共享的下一代网络化全球航班追踪体系结构及关键技术[J]. 山东大学学报(理学版), 2016, 51(11): 1-6.
[4] 何炎祥, 刘健博, 孙松涛, 文卫东. 基于层叠条件随机场的微博商品评论情感分类[J]. 山东大学学报(理学版), 2015, 50(11): 67-73.
[5] 张晶, 薛冷, 崔毅, 容会, 王剑平. 基于无线传感器网络的双混沌数据加密算法建模与评价[J]. 山东大学学报(理学版), 2015, 50(03): 1-5.
[6] 潘清清,周枫,余正涛,郭剑毅,线岩团. 基于条件随机场的越南语命名实体识别方法[J]. 山东大学学报(理学版), 2014, 49(1): 76-79.
[7] 康海燕, 杨孔雨, 陈建明. 于K-匿名的个性化隐私保护方法研究[J]. 山东大学学报(理学版), 2014, 49(09): 142-149.
[8] 黄景文. 信息安全风险因素分析的模糊群决策方法研究[J]. J4, 2012, 47(11): 45-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 杨伦,徐正刚,王慧*,陈其美,陈伟,胡艳霞,石元,祝洪磊,曾勇庆*. RNA干扰沉默PID1基因在C2C12细胞中表达的研究[J]. J4, 2013, 48(1): 36 -42 .
[2] 刘艳萍,吴群英. 优化权重下高斯序列最大值几乎处处中心极限定理[J]. 山东大学学报(理学版), 2014, 49(05): 50 -53 .
[3] 张申贵. 局部超线性p-基尔霍夫方程的多重解[J]. 山东大学学报(理学版), 2014, 49(05): 61 -68 .
[4] 秦兆宇,刘师莲*,杨银荣,刘芙君,李建远,宋春华 . 白斑综合征中国对虾肝胰腺蛋白质组学研究的技术探索[J]. J4, 2007, 42(7): 5 -08 .
[5] 章东青,殷晓斌,高汉鹏. Quasi-线性Armendariz模[J]. 山东大学学报(理学版), 2016, 51(12): 1 -6 .
[6] 刁科凤,赵 平 . 具有最小连通点对图的C-超图的染色讨论[J]. J4, 2007, 42(2): 56 -58 .
[7] 杨军. 金属基纳米材料表征和纳米结构调控[J]. 山东大学学报(理学版), 2013, 48(1): 1 -22 .
[8] 罗斯特,卢丽倩,崔若飞,周伟伟,李增勇*. Monte-Carlo仿真酒精特征波长光子在皮肤中的传输规律及光纤探头设计[J]. J4, 2013, 48(1): 46 -50 .
[9] 董伟伟. 一种具有独立子系统的决策单元DEA排序新方法[J]. J4, 2013, 48(1): 89 -92 .
[10] 裴胜玉,周永权*. 一种基于混沌变异的多目标粒子群优化算法[J]. J4, 2010, 45(7): 18 -23 .