您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (9): 7-12.doi: 10.6040/j.issn.1671-9352.1.2016.PC7

• • 上一篇    下一篇

基于COAE2016数据集的中文实体关系抽取算法研究

孙建东,顾秀森,李彦,徐蔚然*   

  1. 北京邮电大学模式识别与智能系统实验室, 北京 100876
  • 收稿日期:2016-11-25 出版日期:2017-09-20 发布日期:2017-09-15
  • 通讯作者: 徐蔚然(1975— ),男,副教授,硕士生导师,研究方向为自然语言处理. E-mail:xuweiran@bupt.edu.cn E-mail:sunjd@bupt.edu.cn
  • 作者简介:孙建东(1994— ),男,硕士研究生,研究方向为自然语言处理. E-mail:sunjd@bupt.edu.cn
  • 基金资助:
    111计划资助项目(B08004);国家自然科学基金资助项目(61300080,61273217,61671078);国家教育部博士点基金资助项目(20130005110004)

Chinese entity relation extraction algorithms based on COAE2016 datasets

SUN Jian-dong, GU Xiu-sen, LI Yan, XU Wei-ran*   

  1. Beijing University of Posts and Telecommunications, Lab of Pattern Recognition and Intelligent System, Beijing 100876, China
  • Received:2016-11-25 Online:2017-09-20 Published:2017-09-15

摘要: 实体关系抽取是知识图谱技术的重要环节之一。英文实体关系抽取的研究已经比较成熟,相比之下,中文实体关系抽取的发展却并不理想。由于相关语料的匮乏,中文实体关系抽取的发展受到了一定的限制。针对这一问题,COAE2016在任务三中提出了中文实体关系抽取任务。通过分别使用了基于模板、基于SVM与基于CNN的实体关系抽取算法解决了这一问题,并根据其在COAE2016任务三的评测数据集上的效果,对比分析了三种实体关系抽取算法的优缺点。实验证明,基于SVM的算法和基于CNN的算法均在评测数据集上表现出了良好的效果。

关键词: 关系抽取, 模板匹配, SVM, CNN

Abstract: Entity relation extraction is one of the important procedures of knowledge graph technology. Research on entity relation extraction in English is comparatively developed. By contrast, the development of Chinese entity relation extraction is not ideal, and it is mainly because the lack of corpus. In order to solve this problem, COAE2016 proposes a Chinese entity relation extraction task in task 3. In this paper, we use three algorithms to solve the problem: a pattern based algorithm, a SVM based algorithm and a CNN based algorithm respectively. Then, we analyze the advantages and the disadvantages of the three algorithms according to the effects of the dataset in COAE2016 Experiments show that the SVM based algorithm and the CNN based algorithm are useful to extract entity relation.

Key words: feature extraction, SVM, CNN, pattern match

中图分类号: 

  • TP391
[1] 徐健,张智雄,吴振新. 实体关系抽取的技术方法综述[J]. 现代图书情报技术, 2008(8): 18-23. XU Jian, ZHANG Zhixiong, WU Zhenxin. Review on techniques of entity relation extraction [J]. New Technology of Library and Information Service, 2008(8):18-23.
[2] 毛小丽, 何中市, 邢欣来, 等. 基于特征选择的实体关系抽取[J]. 计算机应用研究, 2012, 29(2):530-532. MAO Xiaoli, HE Zhongshi, XING Xinlai, et al. Entity relation extraction based on feature selection[J]. Application Research of Computers, 2012, 29(2):530-532.
[3] 车万翔, 刘挺, 李生. 实体关系自动抽取[J]. 中文信息学报, 2004, 19(2): 1-6. CHE Wanxiang, LIU Ting, LI Sheng. Automatic entity relation extraction [J]. Journal of Chinese Information Processing, 2004, 19(2):1-6.
[4] 刘建舟, 邵雄凯. 一种改进的中文实体关系抽取方法[J]. 软件导刊,2011,10(4):27-29. LIU Jianzhou, SHAO Xiongkai. An improved method of chinese entity relation extraction [J]. Software Guide, 2011, 10(4):27-29.
[5] 张素香, 文娟, 秦颖, 等. 实体关系的自动抽取研究[J]. 哈尔滨工程大学学报, 2006, 27(S1):370-373. ZHANG Suxiang, WEN Juan, QIN Ying, et al. Study about automatic entity relation extraction [J]. Journal of Harbin Engineering University, 2006, 27(S1):370-373.
[6] LECUN Yann, BENGIO Yoshua, HINTON Geoffrey. Deep learning[J]. Nature.2015, 521(7553): 436-444.
[7] KRIZHEVSKY Alex SUTSKEVER Ilya, HINTON Geoffrey. ImageNet classification with deep convolutionalneural networks[J]. International Conference on Neural Information Processing Systems, 2012, 25(2):1097-1105.
[8] ZHANG Shiliang, LIU Cong, JIANG Hui, et al. Feedforward sequential memory networks:a new structure to learn long-term dependency [J]. Computer Science, 2015, arXiv:1510.02693.
[9] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2014(3):1137-1155.
[10] HASHIMOTO K, STENETORP P, MIWA M, et al. Task-oriented learning of word embeddings for semantic relation classification[J]. Computer Science,2015, arXiv: 1503. 00095.
[11] HENDRICKX I, KIM S N, KOZAREVA Z, et al. Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominal[C] //Proceedings of the NAACL HLT Workshop on Semantic Evaluations: Recent Achievements and Future Directions Boulder: Association for Computational Linguistics Stroudsburg, PA, USA, 2009: 94-99.
[12] MIKOLOV Tomas, SUTSKEVER Ilya, CHEN Kai, et al. Distributed representations of words and pharses and their coposi-tionality[J].Computer Science, 2013, arXiv:1310.4546.
[1] 龚双双,陈钰枫,徐金安,张玉洁. 基于网络文本的汉语多词表达抽取方法[J]. 山东大学学报(理学版), 2018, 53(9): 40-48.
[2] 杨艳,徐冰,杨沐昀,赵晶晶. 一种基于联合深度学习模型的情感分类方法[J]. 山东大学学报(理学版), 2017, 52(9): 19-25.
[3] 谭红叶,要一璐,梁颖红. 基于知识脉络的科技论文推荐[J]. 山东大学学报(理学版), 2016, 51(5): 94-101.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!