基于多源知识的中文微博命名实体链接

doi:10.6040/j.issn.1671-9352.3.2014.026

山东大学学报（理学版） ›› 2015, Vol. 50 ›› Issue (07): 9-16.doi: 10.6040/j.issn.1671-9352.3.2014.026

基于多源知识的中文微博命名实体链接

昝红英¹, 吴泳钢¹, 贾玉祥¹, 牛桂玲²

1. 郑州大学信息工程学院, 河南郑州 450001;
2. 郑州大学外语学院, 河南郑州 450001

收稿日期:2015-03-03 出版日期:2015-07-20 发布日期:2015-07-31
作者简介:昝红英(1966-),女,博士,教授,研究方向为自然语言处理.E-mail:iehyzan@zzu.edu.cn
基金资助:
国家自然科学基金资助项目(61402419,60970083,61272221);国家社会科学基金资助项目(14BYY096);国家高技术研究发展计划863计划项目(2012AA011101);河南省科技厅科技攻关计划资助项目(132102210407);河南省科技厅基础研究资助项目(142300410231,142300410308);河南省教育厅科学技术研究重点项目(12B520055,13B520381);计算语言学教育部重点实验室(北京大学)开放课题资助项目(201401)

Chinese Micro-blog named entity linking based on multisource knowledge

ZAN Hong-ying¹, WU Yong-gang¹, JIA Yu-xiang¹, NIU Gui-ling²

1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, Henan, China;
2. School of Foreign Language, Zhengzhou University, Zhengzhou 450001, Henan, China

Received:2015-03-03 Online:2015-07-20 Published:2015-07-31

摘要/Abstract

摘要： 命名实体在文本中是承载信息的重要单元,而微博作为一种分享简短实时信息的社交网络平台,其文本长度短、不规范,而且常有新词出现,这就需要对其命名实体进行准确的理解,以提高对文本信息的正确分析。提出了基于多源知识的中文微博命名实体链接,把同义词词典、百科资源等知识与词袋模型相结合实现命名实体的链接。在NLP&CC2013中文微博实体链接评测数据集进行了实验,获得微平均准确率为92.97%,与NLP&CC2013中文实体链接评测最好的评测结果相比,提高了两个百分点。

关键词: 命名实体, 中文微博实体链接, 同义词词典, 百科资源, 词袋模型

Abstract: Named entity is an important component conveying information in texts. Micro-blog is a social network platform used to share brief real-time information, with characteristics such as short text length, nonstandard words, and even the frequent emergence of neologisms.So an accurate understanding of the named entities is needed to ensure a correct analysis of the text information. A Chinese Micro-blog entity linking strategy was proposed based on multi-resource knowledge, combing the dictionary of synonyms, the encyclopedia resources as well as the bag-of-words model together to deal with named entity linking.In this strategy, named entities to be linked in Micro-blog were mapped to the corresponding candidate entities in the knowledge base. The evaluation results obtain a micro average accuracy of 92.97%, based on experiments using data sets of NLP& CC2013 Chinese micro-blog entity linking track. Compared with the state-of-the-art result, the accuracy of this method is two percent higher,which demonstrates the effectiveness of our method.

Key words: named entity, dictionary of synonyms, bag-of-words model, encyclopedia resources, Chinese Micro-blog entity linking

中图分类号:

TP391

昝红英, 吴泳钢, 贾玉祥, 牛桂玲. 基于多源知识的中文微博命名实体链接[J]. 山东大学学报（理学版）, 2015, 50(07): 9-16.

ZAN Hong-ying, WU Yong-gang, JIA Yu-xiang, NIU Gui-ling. Chinese Micro-blog named entity linking based on multisource knowledge[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 9-16.

参考文献

[1] LIU Xiaohua, LI Yitong, WU Haocheng, et al. Entity linking for tweets[C]//Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2013: 1304-1311.
[2] BONTCH EVA K, ROUT D. Making sense of social media streams through semantics: a survey[J]. Semantic Web Journal, 2012. URL: http://www. semantic-web-journal.net/sites/default/files/swj303.pdf
[3] 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报, 2009, 23(2):9-11. ZHAO Jun. Named entity recognition, disambiguation and cross lingual[J]. Chinese Information Processing, 2009, 23(2):9-11.
[4] GUO Yuhang, QIN Bing, LI Yuqin, et al. Improving candidate generation for entity linking[M]//Natural Language Processing and Information Systems. Berlin: Springer, 2013: 225-236.
[5] DILL S, EIRON N, GIBSON D, et al. SemTag and seeker: bootstrapping the semantic web via automated semantic annotation[C]//Proceedings of the 12th international conference on World Wide Web. New York: ACM, 2003:178-186.
[6] MIHALCEA R, CSOMAI A. Wikify!: linking documents to encyclopedic knowledge[C]//Proceedings of the sixteenth ACM Conference on Conference on Information and Knowledge Management. New York: ACM, 2007: 233-242.
[7] GABRILOVICH E, MARKOVITCH S. Computing semantic relatedness using wikipedia-based explicit semantic analysis[C]//IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence San Francisco: Morgan Kaufmann Publishers Inc, 2007: 1606-1611.
[8] HONNIBAL M, DALE R. DAMSEL: The DSTO/Macquarie system for entity-linking[J]//Proceeding of TAC, 2009. http://www.nist.gov/tac/publications/2009/participant.papers/DAMSEL. proceedings.pdf.
[9] BIKEL D, CASTELLI V, FLORIAN R, et al. Entity linking and slot filling through statistical processing and inference rules[C]//Proceeding of TAC 2009 Workshop. http://www.nist.gov/tac/publications/2009/participant.papers/IBM proceedings.pdf.
[10] HAN Xianpei, SUN Le. A generative entity-mention model for linking entities with knowledge base[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2011: 945-954.
[11] KULKARNI S, SINGH A, RAMAKRISHNAN G, et al. Collective annotation of Wikipedia entities in web text[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 457-466.
[12] HAN Xianpei, SUN Le, ZHAO Jun. Collective entity linking in web text: a graph-based method[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 765-774.
[13] PERALTA V. Extraction and integration of movielens and imdb data[R].France: Laboratoire PRiSM, Université de Versailles, 2007.
[14] LEY M. DBLP: some lessons learned[J]. Proceedings of the VLDB Endowment, 2009, 2(2):1493-1500.
[15] AUER S, BIZER C, KOBILAROV G, et al. Dbpedia: A nucleus for a web of open data[M]. Berlin: Springer, 2007: 722-735.
[16] SUCHANEK F M, KASNECI G, WEIKUM G. Yago: a large ontology from wikipedia and wordnet[J]. Web Semantics: Science, Services and Agents on the World Wide Web, 2008, 6(3):203-217.
[17] SUCHANEK F M, KASNECI G, WEIKUM G. Yago: a core of semantic knowledge[C]//Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 697-706.
[18] BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2008: 1247-1250.
[19] HASSELL J, ALEMAN-MEZA B, ARPINAR I B. Ontology-driven automatic entity disambiguation in unstructured text[M]. Berlin: Springer, 2006: 44-57.
[20] LU Yiming, NIE Zaiqing, CHENG Taoyuan, et al. Name disambiguation using Web connection[C]//Proceeding of the 19th National Conference on Artificial Intelligence (AAAI-00). California: American Association for Artifical Intelligence, 2007: 56-61
[21] KALASHNIKOV D V, NURAY-TURAN R, MEHROTRA S. Towards breaking the quality curse: a web-querying approach to web people search[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2008: 27-34.
[22] SILVIU Cucerzan. Large-scale named entity disambiguation based on wikipedia data[J]. Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2007: 708-716
[23] HAN Xianpei, ZHAO Jun. Named entity disambiguation by leveraging wikipedia semantic knowledge[J]. Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 215-224.
[24] HAN Xianpei, ZHAO Jun. Structural semantic relatedness: a knowledge-based method to named entity disambiguation[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL,2010: 50-59.
[25] HAN Xianpei, SUN Le, ZHAO Jun. Collective entity linking in Web text: a graph-based method[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 765-774.
[26] HAN Xianpei, SUN Le. A generative entity-mention model for linking entities with knowledge base[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2011: 945-954.
[27] LIU Xiaohua, ZHOU Ming, WEI Furu, et al. Joint inference of named entity recognition and normalization for tweets[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2012: 526-535.
[28] COHEN W, RAVIKUMAR P, FIENBERG S. A comparison of string metrics for matching names and records[C]//KDD Workshop on Data Cleaning and Object Consolidation. California: American Association for Artificial Intelligence, 2003, 3:73-78.
[29] 曹犟, 邬晓钧, 夏云庆, 等. 基于拼音索引的中文模糊匹配算法[J]. 清华大学学报: 自然科学版, 2009, 49(S1):1328-1332. CAO Jiang, WU Xiaojun, XIA Yunqing, et al. Pinyin-indexed method for approximate matching in Chinese[J]. Journal of Tsinghua University: Science and Technology, 2009, 49(S1):1328-1332.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于多源知识的中文微博命名实体链接

Chinese Micro-blog named entity linking based on multisource knowledge

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

多维度评价

本文评价

推荐阅读 0

[1]	潘清清,周枫,余正涛,郭剑毅,线岩团. 基于条件随机场的越南语命名实体识别方法[J]. 山东大学学报（理学版）, 2014, 49(1): 76-79.
[2]	曹雷1,2,郭嘉丰1,程学旗1. 基于二部图半监督方法的查询日志实体挖掘[J]. J4, 2012, 47(5): 32-37.