山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (7): 52-58.doi: 10.6040/j.issn.1671-9352.1.2016.072
杜漫,徐学可,杜慧,伍大勇,刘悦,程学旗
DU Man, XU Xue-ke, DU Hui, WU Da-yong, LIU Yue, CHENG Xue-qi
摘要: 提出了一种面向情绪分类的融合词内部信息和情绪标签的词向量学习方法。在CBOW模型的基础上,引入词内部成分和情绪标签信息,以适应微博情绪表达的不规范,同时丰富词向量的情绪语义。对于输入文本,按照词的TF-IDF权重对词向量进行加权求和,以作为文本向量表示。以上述词向量或文本向量作为情绪分类器的输入,采用机器学习的分类方法(LR、SVM、CNN),验证本文情绪词向量在情绪分类任务上的实验效果。实验表明,情绪词向量与原始CBOW词向量相比,在准确率、召回率、F值等各项指标上都有更好的表现。
中图分类号:
[1] MISHNE G. Experiments with mood classification in blog posts[C] //Proceedings of 1st Workshop on Stylistic Analysis of Text for Information Access(Style2005). Sweden: Institute of Computer Science, 2005: 47-54. [2] GHAZI D, INKPEN D, SZPAKOWICZ S. Hierarchical versus flat classification of emotions in text[C] //Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.Stroudsburg: Association for Computational Linguistics, 2010: 140-146. [3] DAVIDOV D, TSUR O, RAPPOPORT A. enhanced sentiment learning using twitter hashtags and smileys[C] //Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Stroudsburg: Association for Computational Linguistics, 2010: 241-249. [4] GOLDER S A, MACY M W. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures[J]. Science, 2011, 333(6051):1878-1881. [5] PALTOGLOU G, THELWALL M. Twitter, myspace, digg: unsupervised sentiment analysis in social media[J]. Acm Transactions on Intelligent Systems & Technology, 2012, 3(4):67-83. [6] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26(1):73-83. XIE Lixing, ZHOU Ming, SUN Maosong. Hierarchical structure based hybrid approach tosentiment analysis of chinese micro blog and its feature extraction[J]. Journal of Chinese Information Processing, 2012, 26(1):73-83. [7] 刘宝芹, 牛耘. 多层次中文微博情绪分析[J]. 计算机技术与发展, 2015, 25(11):23-26. LIU Baoqin, NIU Yun.multi-hierarchy emotion analysis of chinese microblog[J].Computer Technology and Development, 2015, 25(11):23-26. [8] 欧阳纯萍, 阳小华, 雷龙艳, 等. 多策略中文微博细粒度情绪分析研究[J]. 北京大学学报(自然科学版), 2014, 50(1):67-72. OUYANG Chunping, YANG Xiaohua, LEI Longyan, et al.Multi-strategy approach for fine-grained sentiment analysisofchinese microblog[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1):67-72. [9] 雷龙艳. 中文微博细粒度情绪识别研究[D]. 衡阳:南华大学, 2014. LEI Longyan. Research on fine-grained sentiment analysis base on chinese[D]. Henyang:University of South China, 2014. [10] MIKOLOV Tomas, YIH Wentau, ZWEIG Geoffrey. Linguistic regularities in continuous space word representations[C] //Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT-2013). Stroudsburg: Association for Computational Linguistics, 2013: 746-751. [11] MIKOLOV T, SUTSKEVER I, CHENK, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119. [12] MIKOLOV Tomas, CHEN Kai, CORRADO Greg, et al. Efficient estimation of word representations in vector space[J]. Eprint Arxiv, 2013, arXiv:1301.3781. [13] KIM Y. Convolutional neural networks for sentence classification[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). Stroudsburg: Association for Computational Linguistics, 2014: 1746-1751. |
[1] | 严倩,王礼敏,李寿山,周国栋. 结合新闻和评论文本的读者情绪分类方法[J]. 山东大学学报(理学版), 2018, 53(9): 35-39. |
[2] | 黄栋,徐博,许侃,林鸿飞,杨志豪. 基于词向量和EMD距离的短文本聚类[J]. 山东大学学报(理学版), 2017, 52(7): 66-72. |
[3] | 施寒潇,厉小军,郝腾达,柳虹,朱柳青. 微博短文本的情绪分析方法[J]. 山东大学学报(理学版), 2017, 52(7): 80-90. |
[4] | 姚亮,洪宇,刘昊,刘乐,姚建民. 基于语义分布相似度的翻译模型领域自适应研究[J]. 山东大学学报(理学版), 2016, 51(7): 43-50. |
[5] | 杨阳, 刘龙飞, 魏现辉, 林鸿飞. 基于词向量的情感新词发现方法[J]. 山东大学学报(理学版), 2014, 49(11): 51-58. |
|