您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (3): 81-94.doi: 10.6040/j.issn.1671-9352.1.2022.3548

•   • 上一篇    下一篇

基于情感分布的emoji嵌入式表示

曾雪强(),孙雨,刘烨,万中英*(),左家莉,王明文   

  1. 江西师范大学计算机信息工程学院, 江西 南昌 330022
  • 收稿日期:2023-05-04 出版日期:2024-03-20 发布日期:2024-03-06
  • 通讯作者: 万中英 E-mail:xqzeng@jxnu.edu.cn;libby@jxnu.edu.cn
  • 作者简介:曾雪强(1978—),男,教授,博士,研究方向为自然语言处理、情感分析、数据降维. E-mail:xqzeng@jxnu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62266021);江西省教育厅科学技术研究项目(GJJ2200330)

Emoji embedded representation based on emotion distribution

Xueqiang ZENG(),Yu SUN,Ye LIU,Zhongying WAN*(),Jiali ZUO,Mingwen WANG   

  1. School of Computer & Information Engineering, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
  • Received:2023-05-04 Online:2024-03-20 Published:2024-03-06
  • Contact: Zhongying WAN E-mail:xqzeng@jxnu.edu.cn;libby@jxnu.edu.cn

摘要:

提出了一种基于情感分布的emoji嵌入式表示方法(emoji embedded representation based on emotion distribution, EDEER)。EDEER方法采用基于BERT的情绪预测模型软标签, 从真实数据中学习emoji嵌入式表示, 通过情感分布直接建模emoji在各种情绪上的表达程度, 使嵌入式表示中包含emoji的多种情感信息。在包含emoji的中文微博数据集上的多组对比实验表明, 本文提出的方法可以有效地学习到与细粒度情绪直接关联的emoji嵌入式表示, 构建具有较高情绪表达质量的emoji表示空间。

关键词: emoji, 情绪分析, 嵌入式表示, 情感分布

Abstract:

This paper proposes an emoji embedded representation based on emotion distribution (EDEER) method. The EDEER method adopts the soft label of BERT-based emotion prediction model to learn emoji embedded representation from real data, and directly models the expression degree of emoji on various sentiments through emotion distribution, so that the embedded representation contains various emotional information of emoji. Multiple sets of comparative experiments on the Chinese Weibo dataset containing emoji shows that the method proposed in this paper can effectively learn emoji embedded representations that are directly related to fine-grained sentiments, and build an emoji representation space with high emotional expression quality.

Key words: emoji, sentiment analysis, embedded representation, emotion distribution

中图分类号: 

  • TP391

表1

WEC微博数据集中的示例"

序号 示例 情绪
1 感谢一切,爱你们
2 满满的正月味道,让我不禁思念远在故乡的亲人
3 奋斗的人生才有意义,充实才叫人生
4 真不知道要怎么和敷衍对话的人继续聊下去

图1

和在7种情绪上的情感分布 注: 纵坐标左侧数字表示emoji在各情绪上的表达程度, 数值越大表示emoji表达此情绪的程度越高。横坐标: 1.怒;2.恶;3.惧;4.乐;5.爱;6.悲;7.惊。"

图2

基于BERT的情绪预测模型示意图"

表2

实验数据集"

数据集 emoji数量 含emoji的句子数 总句子数
NLP&CC2013 110 1 509 10 487
NLP&CC2014 28 637 5 918
WEC 191 8 961 39 660
总计 262 11 107 56 065

表3

64个emoji在7种情绪上的句子标注数量"

emoji 描述词 总句子数 7种情绪的句子标注数量 emoji 描述词 总句子数 7种情绪的句子标注数量
1 787 153 186 27 171 62 1 161 27 吃惊 109 14 22 3 9 2 17 42
哈哈 629 8 42 4 457 67 38 13 鄙视 106 37 38 1 5 4 17 4
抓狂 606 152 160 14 25 23 224 8 思考 105 8 30 2 20 15 19 11
539 6 8 5 273 168 74 5 亲亲 104 1 6 0 57 27 12 1
459 273 80 3 9 4 82 8 睡觉 103 8 34 3 21 7 28 2
嘻嘻 401 7 19 3 303 45 22 2 98 4 2 0 20 70 2 0
352 53 80 10 13 3 168 25 浮云 96 8 17 0 21 9 39 2
309 53 108 6 25 8 80 29 笑哈哈 92 4 5 0 62 11 8 2
拜拜 295 30 15 1 15 6 222 6 花心 87 2 6 0 42 30 2 5
悲伤 293 19 21 6 17 5 224 1 馋嘴 85 3 14 0 49 9 9 1
偷笑 276 8 28 2 172 33 26 7 威武 78 3 5 0 34 22 9 5
伤心 267 22 22 4 11 10 196 2 微风 71 1 8 1 35 12 12 2
263 74 84 5 15 3 77 5 围观 64 3 8 0 28 9 14 2
生病 258 30 53 11 8 6 147 3 62 14 28 0 2 1 16 1
呵呵 239 8 20 2 84 44 78 3 做鬼脸 61 2 8 1 32 10 7 1
可怜 208 18 24 8 23 11 121 3 熊猫 61 4 4 1 12 14 21 5
失望 206 12 27 8 6 11 139 3 蛋糕 57 0 0 0 28 24 4 1
害羞 187 7 20 10 82 37 28 3 猪头 57 9 10 0 15 12 10 1
蜡烛 185 20 12 1 20 16 114 2 崩溃 53 19 12 1 1 2 18 0
可爱 179 1 10 2 82 49 30 5 话筒 52 3 8 0 9 18 14 0
177 25 53 3 16 6 59 15 愤怒 51 35 6 0 1 1 7 1
月亮 176 7 11 3 62 47 44 2 疑问 51 8 10 0 6 4 20 3
鼓掌 174 2 13 0 95 53 9 2 鲜花 49 1 0 0 22 16 9 1
委屈 167 13 19 4 9 8 112 2 闭嘴 46 7 7 1 2 2 23 4
奥特曼 158 9 21 1 49 45 28 5 神马 44 10 9 0 8 5 11 1
黑线 145 29 52 1 10 3 38 12 42 1 20 0 3 8 8 2
兔子 141 6 11 0 85 16 20 3 38 16 10 0 0 1 11 0
130 2 20 0 60 29 14 5 下雨 37 3 6 1 4 4 18 1
泪流满面 126 13 10 5 10 5 81 2 悲催 36 0 7 1 1 1 26 0
怒骂 119 51 28 0 3 2 32 3 干杯 35 1 0 0 21 8 5 0
太阳 112 6 2 0 67 17 17 3 抱抱 32 1 1 0 25 5 0 0
110 2 2 3 65 30 5 3 30 5 7 2 0 4 12 0

表4

7种情绪的情感极性及其emoji数量"

情绪 情感极性 emoji数量
积极 27
积极 2
消极 5
消极 8
消极 21
消极 0
模糊 1

图3

64个emoji的情感极性"

图4

不同emoji表示方法下64个emoji向量的t-SNE可视化"

图5

不同emoji表示方法的情感一致性准确率"

表5

64个emoji的情绪映射结果"

模型 准确率/% 平均准确率/%
CWV 0.00 1.67 11.11 0.00 9.52 0.00 9.38
DSG 80.00 8.89 37.04 50.00 42.86 100.00 43.75
fastText 60.00 6.67 3.70 0.00 0.00 100.00 7.81
BERT-EDEER 80.00 8.89 96.30 50.00 90.48 100.00 82.81

图6

64个emoji和7种情绪之间的关联热图 注: 右侧数值表示每个emoji与各类情绪的关联强度, 颜色越深代表关联强度越高。"

图7

7种情绪之间的关联热图 注: 右侧数值表示各类情绪与情绪间的关联强度, 颜色越深代表关联强度越高。"

1 BIRJALI M , KASRI M , HSSANE A B . A comprehensive survey on sentiment analysis: approaches, challenges and trends[J]. Knowledge-Based Systems, 2021, 226, 107134.
doi: 10.1016/j.knosys.2021.107134
2 GUPTA S, SINGH A, RANJAN J. Sentiment analysis: usage of text and emoji for expressing sentiments[C]//Advances in Data and Information Sciences: Proceedings of ICDIS 2019. Singapore: Springer, 2020: 477-486.
3 LEE S , JEONG D , PARK E . MultiEmo: multi-task framework for emoji prediction[J]. Knowledge-Based Systems, 2022, 242, 108437.
doi: 10.1016/j.knosys.2022.108437
4 谭皓, 邓树文, 钱涛, 等. 基于表情符注意力机制的微博情感分析模型[J]. 计算机应用研究, 2019, 36 (9): 2647- 2650.
TAN Hao , DENG Shuwen , QIAN Tao , et al. A microblog sentiment analysis model based on emoji attention mechanism[J]. Application Research of Computers, 2019, 36 (9): 2647- 2650.
5 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26 (1): 73- 84.
XIE Lixing , ZHOU Ming , SUN Maosong . Hierarchical structure based hybrid approach to sentiment analysis of Chinese microblog and its feature extraction[J]. Journal of Chinese Information Processing, 2012, 26 (1): 73- 84.
6 EISNER B, ROCKTÄ T, AUGENSTEIN I, et al. Emoji2vec: learning emoji representations from their description[C]//Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media. Stroudsburg: ACL, 2016: 48-54.
7 GROVER V . Exploiting emojis in sentiment analysis: a survey[J]. Journal of the Institution of Engineers (India): Series B, 2021, 103 (1): 1- 14.
8 WIJERATNE S, BALASURIYA L, SHETH A, et al. A semantics-based measure of emoji similarity[C]//Proceedings of the International Conference on Web Intelligence. New York: ACM, 2017: 646-653.
9 BARBIERI F, RONZANO F, SAGGION H. What does this emoji mean? a vector space skip-gram model for Twitter emojis[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Slovenia: ELRA, 2016: 3967-3972.
10 LI M, GUNTUKU S, JAKHETIYA V, et al. Exploring (dis-) similarities in emoji-emotion association on Twitter and Weibo[C]//Companion proceedings of the 2019 world wide web conference. New York: ACM, 2019: 461-467.
11 SHOEB A A M, DE MELO G. Emotag1200: understanding the association between emojis and emotions[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 8957-8967.
12 王文远, 王大玲, 冯时, 等. 一种面向情感分析的微博表情情感词典构建及应用[J]. 计算机与数字工程, 2012, 40 (11): 6- 9.
WANG Wenyuang , WANG Daling , FENG Shi , et al. A sentiment dictionary construction and application of microblog emoji sentiment dictionary for sentiment analysis[J]. Computer and Digital Engineering, 2012, 40 (11): 6- 9.
13 NOVAK P K , SMAILOVI Ć J , SLUBAN B , et al. Sentiment of emojis[J]. PLoS One, 2015, 10 (12): e0144296.
doi: 10.1371/journal.pone.0144296
14 LI D , RZEPKA R , PTASZYNSKI M , et al. HEMOS: a novel deep learning-based fine-grained humor detecting method for sentiment analysis of social media[J]. Information Processing & Management, 2020, 57 (6): 102290.
15 LI M, LONG Y, QIN L, et al. Emotion corpus construction based on selection from hashtags[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation. Slovenia: ELRA, 2016: 1845-1849.
16 何炎祥, 孙松涛, 牛菲菲, 等. 用于微博情感分析的一种情感语义增强的深度学习模型[J]. 计算机学报, 2017, 40 (4): 18.
HE Yanxiang , SUN Songtao , NIU Feifei , et al. A deep learning model enhanced with emotion semantics for microblog sentiment analysis[J]. Chinese Journal of Computers, 2017, 40 (4): 18.
17 FELBO B, MISLOVE A, S∅GAARD A, et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2017: 1615-1625.
18 SINGH A, BLANCO E, JIN W. Incorporating emoji descriptions improves tweet classification[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 2096-2101.
19 DIMSON T . Emojineering part 1: machine learning for emoji trends[J]. Instagram Engineering Blog, 2015, 30, 1- 10.
20 KIMURA M, KATSURAI M. Automatic construction of an emoji sentiment lexicon[C]//Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. New York: ACM, 2017: 1033-1036.
21 ZHOU Y, XUE H, GENG X. Emotion distribution recognition from facial expressions[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York: ACM, 2015: 1247-1250.
22 曾雪强, 罗明珠, 陈素芬, 等. 基于自适应多重多元回归的人脸年龄估计[J]. 江西师范大学学报(自然科学版), 2019, 43 (1): 68- 75.
ZENG Xueqiang , LUO Mingzhu , CHEN Sufen , et al. The facial age estimation based on adaptive multivariate multiple regression[J]. Journal of Jiangxi Normal University(Natural Sciences Edition), 2019, 43 (1): 68- 75.
23 ZHAO Z, MA X. Text emotion distribution learning from small sample: a meta-learning approach[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3955-3965.
24 ZHOU D, QUOST B, FRÉMONT V. Soft label based semi-supervised boosting for classification and object recognition[C]//2014 13th International Conference on Control Automation Robotics & Vision. Piscataway: IEEE, 2014: 1062-1067.
25 FAYEK H M, LECH M, CAVEDON L. Modeling subjectiveness in emotion recognition with deep neural networks: ensembles vs soft labels[C]//2016 International Joint Conference on Neural Networks. Piscataway: IEEE, 2016: 566-570.
26 ZHAO Z, WU S, YANG M, et al. Robust machine reading comprehension by learning soft labels[C]//Proceedings of the 28th International Conference on Computational Linguistics. Berlin: ICCL, 2020: 2754-2759.
27 FORNACIARI T, UMA A, PAUN S, et al. Beyond black & white: leveraging annotator disagreement via soft-label multi-task learning[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2591-2597.
28 WANG X, ZONG C. Distributed representations of emotion categories in emotion space[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2021: 2364-2375.
29 DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186.
30 姚源林, 王树伟, 徐睿峰, 等. 面向微博文本的情绪标注语料库构建[J]. 中文信息学报, 2014, 28 (5): 83- 91.
YAO Yuanlin , WANG Shuwei , XU Ruifeng , et al. The construction of an emotion annotated corpus on microblog text[J]. Journal of Chinese Information Processing, 2014, 28 (5): 83- 91.
31 LI S, ZHAO Z, HU R, et al. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 138-143.
32 DEMSZKY D, MOVSHOVITZ-ATTIAS D, KO J, et al. GoEmotions: a dataset of fine-grained emotions[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4040-4054.
33 KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2014: 1746-1751.
34 SCHUSTER M , PALIWAL K K . Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45 (11): 2673- 2681.
35 VAN DER MAATEN L , HINTON G . Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9 (11): 2579- 2605.
36 SONG Y, SHI S, LI J, et al. Directional skip-gram: explicitly distinguishing left and right context for word embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 175-180.
37 JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 427-431.
38 TANG D, WEI F, YANG N, et al. Learning sentiment-specific word embedding for Twitter sentiment classification[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2014: 1555-1565.
[1] 施寒潇,厉小军,郝腾达,柳虹,朱柳青. 微博短文本的情绪分析方法[J]. 山东大学学报(理学版), 2017, 52(7): 80-90.
[2] 杜漫,徐学可,杜慧,伍大勇,刘悦,程学旗. 面向情绪分类的情绪词向量学习[J]. 山东大学学报(理学版), 2017, 52(7): 52-58.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 罗斯特,卢丽倩,崔若飞,周伟伟,李增勇*. Monte-Carlo仿真酒精特征波长光子在皮肤中的传输规律及光纤探头设计[J]. J4, 2013, 48(1): 46 -50 .
[2] 张明明,秦永彬. 基于前序关系的非确定型有穷自动机极小化算法[J]. J4, 2010, 45(7): 34 -38 .
[3] 邵国俊,茹淼焱*,孙雪莹. 聚醚接枝聚羧酸系减水剂合成工艺研究[J]. J4, 2013, 48(05): 29 -33 .
[4] 曲晓英,赵 静 . 含时线性Klein-Gordon方程的解[J]. J4, 2007, 42(7): 22 -26 .
[5] 王光臣 . 部分可观测信息下的线性二次非零和随机微分对策[J]. J4, 2007, 42(6): 12 -15 .
[6] 李亚男1,刘磊坡2,王玉光3. 非线性时滞输入系统的滑模控制[J]. J4, 2010, 45(6): 99 -104 .
[7] 张苏梅,马巧灵,赵海霞. 路与圈的积图的(d,1)全标号[J]. J4, 2009, 44(4): 37 -42 .
[8] 苏 祺,项 锟,孙 斌 . 基于链接聚类的Shark-Search算法[J]. J4, 2006, 41(3): 1 -04 .
[9] 金黎明,杨 艳*,刘万顺,韩宝芹,田文杰,范圣第 . 壳寡糖及其衍生物对CCl4诱导的小鼠肝损伤的保护作用[J]. J4, 2007, 42(7): 1 -04 .
[10] 章东青,殷晓斌,高汉鹏. Quasi-线性Armendariz模[J]. 山东大学学报(理学版), 2016, 51(12): 1 -6 .