山东大学学报(理学版) ›› 2015, Vol. 50 ›› Issue (07): 9-16.doi: 10.6040/j.issn.1671-9352.3.2014.026
昝红英1, 吴泳钢1, 贾玉祥1, 牛桂玲2
ZAN Hong-ying1, WU Yong-gang1, JIA Yu-xiang1, NIU Gui-ling2
摘要: 命名实体在文本中是承载信息的重要单元,而微博作为一种分享简短实时信息的社交网络平台,其文本长度短、不规范,而且常有新词出现,这就需要对其命名实体进行准确的理解,以提高对文本信息的正确分析。提出了基于多源知识的中文微博命名实体链接,把同义词词典、百科资源等知识与词袋模型相结合实现命名实体的链接。在NLP&CC2013中文微博实体链接评测数据集进行了实验,获得微平均准确率为92.97%,与NLP&CC2013中文实体链接评测最好的评测结果相比,提高了两个百分点。
中图分类号:
[1] LIU Xiaohua, LI Yitong, WU Haocheng, et al. Entity linking for tweets[C]//Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2013: 1304-1311. [2] BONTCH EVA K, ROUT D. Making sense of social media streams through semantics: a survey[J]. Semantic Web Journal, 2012. URL: http://www. semantic-web-journal.net/sites/default/files/swj303.pdf [3] 赵军. 命名实体识别、排歧和跨语言关联[J]. 中文信息学报, 2009, 23(2):9-11. ZHAO Jun. Named entity recognition, disambiguation and cross lingual[J]. Chinese Information Processing, 2009, 23(2):9-11. [4] GUO Yuhang, QIN Bing, LI Yuqin, et al. Improving candidate generation for entity linking[M]//Natural Language Processing and Information Systems. Berlin: Springer, 2013: 225-236. [5] DILL S, EIRON N, GIBSON D, et al. SemTag and seeker: bootstrapping the semantic web via automated semantic annotation[C]//Proceedings of the 12th international conference on World Wide Web. New York: ACM, 2003:178-186. [6] MIHALCEA R, CSOMAI A. Wikify!: linking documents to encyclopedic knowledge[C]//Proceedings of the sixteenth ACM Conference on Conference on Information and Knowledge Management. New York: ACM, 2007: 233-242. [7] GABRILOVICH E, MARKOVITCH S. Computing semantic relatedness using wikipedia-based explicit semantic analysis[C]//IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence San Francisco: Morgan Kaufmann Publishers Inc, 2007: 1606-1611. [8] HONNIBAL M, DALE R. DAMSEL: The DSTO/Macquarie system for entity-linking[J]//Proceeding of TAC, 2009. http://www.nist.gov/tac/publications/2009/participant.papers/DAMSEL. proceedings.pdf. [9] BIKEL D, CASTELLI V, FLORIAN R, et al. Entity linking and slot filling through statistical processing and inference rules[C]//Proceeding of TAC 2009 Workshop. http://www.nist.gov/tac/publications/2009/participant.papers/IBM proceedings.pdf. [10] HAN Xianpei, SUN Le. A generative entity-mention model for linking entities with knowledge base[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2011: 945-954. [11] KULKARNI S, SINGH A, RAMAKRISHNAN G, et al. Collective annotation of Wikipedia entities in web text[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2009: 457-466. [12] HAN Xianpei, SUN Le, ZHAO Jun. Collective entity linking in web text: a graph-based method[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 765-774. [13] PERALTA V. Extraction and integration of movielens and imdb data[R].France: Laboratoire PRiSM, Université de Versailles, 2007. [14] LEY M. DBLP: some lessons learned[J]. Proceedings of the VLDB Endowment, 2009, 2(2):1493-1500. [15] AUER S, BIZER C, KOBILAROV G, et al. Dbpedia: A nucleus for a web of open data[M]. Berlin: Springer, 2007: 722-735. [16] SUCHANEK F M, KASNECI G, WEIKUM G. Yago: a large ontology from wikipedia and wordnet[J]. Web Semantics: Science, Services and Agents on the World Wide Web, 2008, 6(3):203-217. [17] SUCHANEK F M, KASNECI G, WEIKUM G. Yago: a core of semantic knowledge[C]//Proceedings of the 16th International Conference on World Wide Web. New York: ACM, 2007: 697-706. [18] BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase: a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2008: 1247-1250. [19] HASSELL J, ALEMAN-MEZA B, ARPINAR I B. Ontology-driven automatic entity disambiguation in unstructured text[M]. Berlin: Springer, 2006: 44-57. [20] LU Yiming, NIE Zaiqing, CHENG Taoyuan, et al. Name disambiguation using Web connection[C]//Proceeding of the 19th National Conference on Artificial Intelligence (AAAI-00). California: American Association for Artifical Intelligence, 2007: 56-61 [21] KALASHNIKOV D V, NURAY-TURAN R, MEHROTRA S. Towards breaking the quality curse: a web-querying approach to web people search[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2008: 27-34. [22] SILVIU Cucerzan. Large-scale named entity disambiguation based on wikipedia data[J]. Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2007: 708-716 [23] HAN Xianpei, ZHAO Jun. Named entity disambiguation by leveraging wikipedia semantic knowledge[J]. Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 215-224. [24] HAN Xianpei, ZHAO Jun. Structural semantic relatedness: a knowledge-based method to named entity disambiguation[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL,2010: 50-59. [25] HAN Xianpei, SUN Le, ZHAO Jun. Collective entity linking in Web text: a graph-based method[C]//Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2011: 765-774. [26] HAN Xianpei, SUN Le. A generative entity-mention model for linking entities with knowledge base[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2011: 945-954. [27] LIU Xiaohua, ZHOU Ming, WEI Furu, et al. Joint inference of named entity recognition and normalization for tweets[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2012: 526-535. [28] COHEN W, RAVIKUMAR P, FIENBERG S. A comparison of string metrics for matching names and records[C]//KDD Workshop on Data Cleaning and Object Consolidation. California: American Association for Artificial Intelligence, 2003, 3:73-78. [29] 曹犟, 邬晓钧, 夏云庆, 等. 基于拼音索引的中文模糊匹配算法[J]. 清华大学学报: 自然科学版, 2009, 49(S1):1328-1332. CAO Jiang, WU Xiaojun, XIA Yunqing, et al. Pinyin-indexed method for approximate matching in Chinese[J]. Journal of Tsinghua University: Science and Technology, 2009, 49(S1):1328-1332. |
[1] | 潘清清,周枫,余正涛,郭剑毅,线岩团. 基于条件随机场的越南语命名实体识别方法[J]. 山东大学学报(理学版), 2014, 49(1): 76-79. |
[2] | 曹雷1,2,郭嘉丰1,程学旗1. 基于二部图半监督方法的查询日志实体挖掘[J]. J4, 2012, 47(5): 32-37. |
|