您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (7): 52-58.doi: 10.6040/j.issn.1671-9352.1.2016.072

• • 上一篇    下一篇

面向情绪分类的情绪词向量学习

杜漫,徐学可,杜慧,伍大勇,刘悦,程学旗   

  1. 中国科学院网络数据科学与技术重点实验室, 中国科学院计算技术研究所, 北京 100190
  • 收稿日期:2016-11-25 出版日期:2017-07-20 发布日期:2017-07-07
  • 作者简介:杜漫(1991— ),女,硕士研究生,研究方向为自然语言理解的情感和情绪分类研究.E-mail:duman@ict.ac.cn

Emotion-specific word embedding learning for emotion classification

DU Man, XU Xue-ke, DU Hui, WU Da-yong, LIU Yue, CHENG Xue-qi   

  1. CAS Key Lab of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2016-11-25 Online:2017-07-20 Published:2017-07-07

摘要: 提出了一种面向情绪分类的融合词内部信息和情绪标签的词向量学习方法。在CBOW模型的基础上,引入词内部成分和情绪标签信息,以适应微博情绪表达的不规范,同时丰富词向量的情绪语义。对于输入文本,按照词的TF-IDF权重对词向量进行加权求和,以作为文本向量表示。以上述词向量或文本向量作为情绪分类器的输入,采用机器学习的分类方法(LR、SVM、CNN),验证本文情绪词向量在情绪分类任务上的实验效果。实验表明,情绪词向量与原始CBOW词向量相比,在准确率、召回率、F值等各项指标上都有更好的表现。

关键词: 情绪分类, 情绪分析, 词内部信息, 情绪标签, 词向量

Abstract: We present a method for emotion classification based on word vector learning which considering the inner patterns and emotion labels of words. Based on the CBOW model, we introduce the inner patterns and the emotion label, in order to enrich the emotional semantics of the word vectors. For one input document, according to the TF-IDF weight of the word, we use the weighted linear combination as the text representation. We use the word vectors or text vectors as the input of the emotion classifier, using machine learning classification method(LR, SVM, CNN), to verify the experimental results in emotion classification task. Experiments show that the presented algorithm performs better than CBOW model.

Key words: word embedding, emotion analysis, emotion labels, emotion classification, word inner pattern

中图分类号: 

  • TP391.1
[1] MISHNE G. Experiments with mood classification in blog posts[C] //Proceedings of 1st Workshop on Stylistic Analysis of Text for Information Access(Style2005). Sweden: Institute of Computer Science, 2005: 47-54.
[2] GHAZI D, INKPEN D, SZPAKOWICZ S. Hierarchical versus flat classification of emotions in text[C] //Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text.Stroudsburg: Association for Computational Linguistics, 2010: 140-146.
[3] DAVIDOV D, TSUR O, RAPPOPORT A. enhanced sentiment learning using twitter hashtags and smileys[C] //Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Stroudsburg: Association for Computational Linguistics, 2010: 241-249.
[4] GOLDER S A, MACY M W. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures[J]. Science, 2011, 333(6051):1878-1881.
[5] PALTOGLOU G, THELWALL M. Twitter, myspace, digg: unsupervised sentiment analysis in social media[J]. Acm Transactions on Intelligent Systems & Technology, 2012, 3(4):67-83.
[6] 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26(1):73-83. XIE Lixing, ZHOU Ming, SUN Maosong. Hierarchical structure based hybrid approach tosentiment analysis of chinese micro blog and its feature extraction[J]. Journal of Chinese Information Processing, 2012, 26(1):73-83.
[7] 刘宝芹, 牛耘. 多层次中文微博情绪分析[J]. 计算机技术与发展, 2015, 25(11):23-26. LIU Baoqin, NIU Yun.multi-hierarchy emotion analysis of chinese microblog[J].Computer Technology and Development, 2015, 25(11):23-26.
[8] 欧阳纯萍, 阳小华, 雷龙艳, 等. 多策略中文微博细粒度情绪分析研究[J]. 北京大学学报(自然科学版), 2014, 50(1):67-72. OUYANG Chunping, YANG Xiaohua, LEI Longyan, et al.Multi-strategy approach for fine-grained sentiment analysisofchinese microblog[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1):67-72.
[9] 雷龙艳. 中文微博细粒度情绪识别研究[D]. 衡阳:南华大学, 2014. LEI Longyan. Research on fine-grained sentiment analysis base on chinese[D]. Henyang:University of South China, 2014.
[10] MIKOLOV Tomas, YIH Wentau, ZWEIG Geoffrey. Linguistic regularities in continuous space word representations[C] //Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies(NAACL-HLT-2013). Stroudsburg: Association for Computational Linguistics, 2013: 746-751.
[11] MIKOLOV T, SUTSKEVER I, CHENK, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26:3111-3119.
[12] MIKOLOV Tomas, CHEN Kai, CORRADO Greg, et al. Efficient estimation of word representations in vector space[J]. Eprint Arxiv, 2013, arXiv:1301.3781.
[13] KIM Y. Convolutional neural networks for sentence classification[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). Stroudsburg: Association for Computational Linguistics, 2014: 1746-1751.
[1] 严倩,王礼敏,李寿山,周国栋. 结合新闻和评论文本的读者情绪分类方法[J]. 山东大学学报(理学版), 2018, 53(9): 35-39.
[2] 黄栋,徐博,许侃,林鸿飞,杨志豪. 基于词向量和EMD距离的短文本聚类[J]. 山东大学学报(理学版), 2017, 52(7): 66-72.
[3] 施寒潇,厉小军,郝腾达,柳虹,朱柳青. 微博短文本的情绪分析方法[J]. 山东大学学报(理学版), 2017, 52(7): 80-90.
[4] 姚亮,洪宇,刘昊,刘乐,姚建民. 基于语义分布相似度的翻译模型领域自适应研究[J]. 山东大学学报(理学版), 2016, 51(7): 43-50.
[5] 杨阳, 刘龙飞, 魏现辉, 林鸿飞. 基于词向量的情感新词发现方法[J]. 山东大学学报(理学版), 2014, 49(11): 51-58.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!