您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2016, Vol. 51 ›› Issue (3): 77-85.doi: 10.6040/j.issn.1671-9352.1.2015.070

• • 上一篇    下一篇

微博城市投诉文本中的地理位置实体识别

孙赫1,2,李淑琴2,吕学强1,2*,刘克会3,4   

  1. 1. 网络文化与数字传播北京市重点实验室, 北京 100101; 2. 北京信息科技大学计算机学院, 北京 100101;3. 北京理工大学管理与经济学院, 北京 100081;4.北京城市系统工程研究中心, 北京 100035
  • 收稿日期:2015-11-14 出版日期:2016-03-20 发布日期:2016-04-07
  • 通讯作者: 吕学强(1970— ),男,教授,研究方向为中文信息处理、多媒体信息处理.E-mail:lv.xueqiang@trs.com.cn E-mail:s-hehe@126.com
  • 作者简介:孙赫(1987— ),男,硕士研究生,研究方向为中文信息处理.E-mail:s-hehe@126.com
  • 基金资助:
    国家自然科学基金资助项目(61271304);北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519);北京市教委科技发展计划重点项目暨北京市自然科学基金B类重点项目(KZ201311232037);北京市财政项目(PXM2014-17825-000005);网络文化与数字传播北京市重点实验室开放课题项目(ICDD2015)

Recognition of geographical entity in city complaints of Micro-blog

SUN He1,2, LI Shu-qin2, L(¨overU)Xue-qiang1,2*, LIU Ke-hui3,4   

  1. 1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing 100101, China;
    2. College of Computer, Beijing Information Science and Technology University, Beijing 100101, China;
    3. School of Management and Economics, Beijing Institute of Technology, Beijing 100081, China;
    4. Beijing Research Center of Urban Systems Engineering, Beijing 100035, China
  • Received:2015-11-14 Online:2016-03-20 Published:2016-04-07

摘要: 微博投诉文本中地理位置实体通常存在结构复杂,长度较长,描述较详细的特点。通过对投诉微博文本的分析,提出了地理位置实体自动识别的方法。该方法首先利用特征资源库对微博进行特征标注,使用条件随机场(conditional random fields, CRF)模型识别地理位置实体。其次根据微博和地理位置实体的特点,对CRF识别后的数据进行二次标注。最后利用微博规则库对识别结果进行补召,修正地理位置实体,最终实现地理位置实体的识别。实验结果表明该方法有显著效果,F值可达到85.52%。

关键词: 地理位置实体识别, 微博规则库, CRF, 微博城市投诉文本

Abstract: Geographical entity in city complaints of Micro-blog has usually has the characteristics of complicated structure, long length, the location of detailed description. This paper presents an automatic method to recognize geographical entities through analysis complaints of Micro-blog. First of all, the method utilizes the feature repository of Micro-blog to mark features, using the conditional random field(CRF)model to identify the geographical entities. Second, according to the characteristics of Micro-blog and geographical entity, recognized data by CRF is second marked. Third, rule bank is utilized to supplementing the recognition result and correcting geographical entities, consequently, the recognition of geographical entities are implemented. At last, Experimental results on the proposed method proved to have an F-Score of 85.52%.

Key words: city complaints of Micro-blog, rule bank of Micro-blog, recognition of geographical entity, CRF

中图分类号: 

  • TP393
[1] BOYD D, ELLISON N B. Social network sites: Definition history and scholarship[J]. Journal of Computer Mediated Communication, 2007, 13(1):210-230.
[2] LEE R, WAKAMIYA S, SUMIYA K. Discovery of unusual regional social activities using geo-tagged Microblogs[J]. World Wide Web, 2011,14(4):321-349.
[3] 唐旭日,陈小荷,许超,等. 基于篇章的中文地名识别研究[J]. 中文信息学报,2010,24(2):24-32. TANG Xuri, CHEN Xiaohe, XU Chao, et al. Discourse-based Chinese location name recognition[J]. Journal of Chinese Information Processing, 2010, 24(2):24-32.
[4] 李丽双,黄德根,陈春荣,等. 用支持向量机进行中文地名识别的研究[J]. 小型微型计算机系统,2005,26(8):1416-1419. LI Lishuang, HUANG Degen, CHEN Chunrong, et al. Research on method of automatic recognition of Chinese Place names based on support vector machines[J]. Mini-micro Systems, 2005, 26(8):1416-1419.
[5] 钱晶,张玥杰,张涛. 基于最大熵的汉语人名地名识别方法研究[J].小型微型计算机系统,2006,27(9):1761-1765. QIAN Jing, ZHANG Yuejie, ZHANG Tao. Research on Chinese person name and location name recognition based on maximum entropy model[J]. Mini-micro Systems, 2006, 27(9):1761-1765.
[6] 冯元勇,孙乐,张大鲲,等. 基于小规模尾字特征的中文命名实体识别研究[J]. 电子学报,2008,36(9):1833-1838. FENG Yuanyong, SUN Le, ZHANG Dakun, et al. Study on the chinese named entity recognition using small scale character tail hints[J]. Acta Electronica Sinica, 2008, 36(9):1833-1838.
[7] 蔡华丽,刘鲁,李红. 基于规则推理的突发事件发生地点识别研究[J].情报学报,2011,30(2):219-224. CAI Huali, LIU Lu, LI Hong. Rule Reasoning-based occurring place recognition for unexpected event[J]. Journal of the China Society for Scientific Andtechnical Information, 2011, 30(2):219-224.
[8] 鞠久朋,张伟伟,宁建军,等. CRF与规则相结合的地理空间命名实体识别[J].计算机工程,2011,37(7):210-212,215. JU Jiupeng, ZHANG Weiwei, NING Jianjun, et al. Geospatial named entities recognition using combination rules[J]. Computer Engineering, 2011, 37(7):210-212,215.
[9] LI C, WENG J, HE Q, et al. TwiNER: named entity recognition in targeted twitter stream[C] //Proceedings of the 35th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval(SIGIR 2012).New York:ACM, 2012:721-730.
[10] RITTER A, CLARK S, ETZIONI O. Named entity recognition in tweets: an experimental study[C] //Proceedings of the Conference on Empirical Methods in Natural Language Processing. Somerset: ACL, 2011:1524-1534.
[11] LIU X, ZHANG S, WEI F, et al. Recognizing named entities in tweets[C] //Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Somerset: ACL, 2011, 1:359-367.
[12] LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C] //Proceedings of the 8th International Conference of Machine Learning.New York:ACM, 2001:282-289.
[1] 孙松涛, 何炎祥, 蔡瑞, 李飞, 贺飞艳. 面向微博情感评测任务的多方法对比研究[J]. 山东大学学报(理学版), 2014, 49(11): 43-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!