山东大学学报(理学版) ›› 2016, Vol. 51 ›› Issue (3): 77-85.doi: 10.6040/j.issn.1671-9352.1.2015.070
孙赫1,2,李淑琴2,吕学强1,2*,刘克会3,4
SUN He1,2, LI Shu-qin2, L(¨overU)Xue-qiang1,2*, LIU Ke-hui3,4
摘要: 微博投诉文本中地理位置实体通常存在结构复杂,长度较长,描述较详细的特点。通过对投诉微博文本的分析,提出了地理位置实体自动识别的方法。该方法首先利用特征资源库对微博进行特征标注,使用条件随机场(conditional random fields, CRF)模型识别地理位置实体。其次根据微博和地理位置实体的特点,对CRF识别后的数据进行二次标注。最后利用微博规则库对识别结果进行补召,修正地理位置实体,最终实现地理位置实体的识别。实验结果表明该方法有显著效果,F值可达到85.52%。
中图分类号:
[1] BOYD D, ELLISON N B. Social network sites: Definition history and scholarship[J]. Journal of Computer Mediated Communication, 2007, 13(1):210-230. [2] LEE R, WAKAMIYA S, SUMIYA K. Discovery of unusual regional social activities using geo-tagged Microblogs[J]. World Wide Web, 2011,14(4):321-349. [3] 唐旭日,陈小荷,许超,等. 基于篇章的中文地名识别研究[J]. 中文信息学报,2010,24(2):24-32. TANG Xuri, CHEN Xiaohe, XU Chao, et al. Discourse-based Chinese location name recognition[J]. Journal of Chinese Information Processing, 2010, 24(2):24-32. [4] 李丽双,黄德根,陈春荣,等. 用支持向量机进行中文地名识别的研究[J]. 小型微型计算机系统,2005,26(8):1416-1419. LI Lishuang, HUANG Degen, CHEN Chunrong, et al. Research on method of automatic recognition of Chinese Place names based on support vector machines[J]. Mini-micro Systems, 2005, 26(8):1416-1419. [5] 钱晶,张玥杰,张涛. 基于最大熵的汉语人名地名识别方法研究[J].小型微型计算机系统,2006,27(9):1761-1765. QIAN Jing, ZHANG Yuejie, ZHANG Tao. Research on Chinese person name and location name recognition based on maximum entropy model[J]. Mini-micro Systems, 2006, 27(9):1761-1765. [6] 冯元勇,孙乐,张大鲲,等. 基于小规模尾字特征的中文命名实体识别研究[J]. 电子学报,2008,36(9):1833-1838. FENG Yuanyong, SUN Le, ZHANG Dakun, et al. Study on the chinese named entity recognition using small scale character tail hints[J]. Acta Electronica Sinica, 2008, 36(9):1833-1838. [7] 蔡华丽,刘鲁,李红. 基于规则推理的突发事件发生地点识别研究[J].情报学报,2011,30(2):219-224. CAI Huali, LIU Lu, LI Hong. Rule Reasoning-based occurring place recognition for unexpected event[J]. Journal of the China Society for Scientific Andtechnical Information, 2011, 30(2):219-224. [8] 鞠久朋,张伟伟,宁建军,等. CRF与规则相结合的地理空间命名实体识别[J].计算机工程,2011,37(7):210-212,215. JU Jiupeng, ZHANG Weiwei, NING Jianjun, et al. Geospatial named entities recognition using combination rules[J]. Computer Engineering, 2011, 37(7):210-212,215. [9] LI C, WENG J, HE Q, et al. TwiNER: named entity recognition in targeted twitter stream[C] //Proceedings of the 35th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval(SIGIR 2012).New York:ACM, 2012:721-730. [10] RITTER A, CLARK S, ETZIONI O. Named entity recognition in tweets: an experimental study[C] //Proceedings of the Conference on Empirical Methods in Natural Language Processing. Somerset: ACL, 2011:1524-1534. [11] LIU X, ZHANG S, WEI F, et al. Recognizing named entities in tweets[C] //Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Somerset: ACL, 2011, 1:359-367. [12] LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C] //Proceedings of the 8th International Conference of Machine Learning.New York:ACM, 2001:282-289. |
[1] | 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50. |
|