JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (3): 81-94.doi: 10.6040/j.issn.1671-9352.1.2022.3548

Previous Articles     Next Articles

Emoji embedded representation based on emotion distribution

Xueqiang ZENG(),Yu SUN,Ye LIU,Zhongying WAN*(),Jiali ZUO,Mingwen WANG   

  1. School of Computer & Information Engineering, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
  • Received:2023-05-04 Online:2024-03-20 Published:2024-03-06
  • Contact: Zhongying WAN E-mail:xqzeng@jxnu.edu.cn;libby@jxnu.edu.cn

Abstract:

This paper proposes an emoji embedded representation based on emotion distribution (EDEER) method. The EDEER method adopts the soft label of BERT-based emotion prediction model to learn emoji embedded representation from real data, and directly models the expression degree of emoji on various sentiments through emotion distribution, so that the embedded representation contains various emotional information of emoji. Multiple sets of comparative experiments on the Chinese Weibo dataset containing emoji shows that the method proposed in this paper can effectively learn emoji embedded representations that are directly related to fine-grained sentiments, and build an emoji representation space with high emotional expression quality.

Key words: emoji, sentiment analysis, embedded representation, emotion distribution

CLC Number: 

  • TP391

Table 1

Examples of WEC Weibo dataset"

序号 示例 情绪
1 感谢一切,爱你们
2 满满的正月味道,让我不禁思念远在故乡的亲人
3 奋斗的人生才有意义,充实才叫人生
4 真不知道要怎么和敷衍对话的人继续聊下去

Fig.1

Emotional distribution of and over 7 emotions"

Fig.2

Schematic diagram of the BERT-based sentiment prediction model"

Table 2

Experimental dataset"

数据集 emoji数量 含emoji的句子数 总句子数
NLP&CC2013 110 1 509 10 487
NLP&CC2014 28 637 5 918
WEC 191 8 961 39 660
总计 262 11 107 56 065

Table 3

Number of sentences labeled with 64 emoji on 7 emotions"

emoji 描述词 总句子数 7种情绪的句子标注数量 emoji 描述词 总句子数 7种情绪的句子标注数量
1 787 153 186 27 171 62 1 161 27 吃惊 109 14 22 3 9 2 17 42
哈哈 629 8 42 4 457 67 38 13 鄙视 106 37 38 1 5 4 17 4
抓狂 606 152 160 14 25 23 224 8 思考 105 8 30 2 20 15 19 11
539 6 8 5 273 168 74 5 亲亲 104 1 6 0 57 27 12 1
459 273 80 3 9 4 82 8 睡觉 103 8 34 3 21 7 28 2
嘻嘻 401 7 19 3 303 45 22 2 98 4 2 0 20 70 2 0
352 53 80 10 13 3 168 25 浮云 96 8 17 0 21 9 39 2
309 53 108 6 25 8 80 29 笑哈哈 92 4 5 0 62 11 8 2
拜拜 295 30 15 1 15 6 222 6 花心 87 2 6 0 42 30 2 5
悲伤 293 19 21 6 17 5 224 1 馋嘴 85 3 14 0 49 9 9 1
偷笑 276 8 28 2 172 33 26 7 威武 78 3 5 0 34 22 9 5
伤心 267 22 22 4 11 10 196 2 微风 71 1 8 1 35 12 12 2
263 74 84 5 15 3 77 5 围观 64 3 8 0 28 9 14 2
生病 258 30 53 11 8 6 147 3 62 14 28 0 2 1 16 1
呵呵 239 8 20 2 84 44 78 3 做鬼脸 61 2 8 1 32 10 7 1
可怜 208 18 24 8 23 11 121 3 熊猫 61 4 4 1 12 14 21 5
失望 206 12 27 8 6 11 139 3 蛋糕 57 0 0 0 28 24 4 1
害羞 187 7 20 10 82 37 28 3 猪头 57 9 10 0 15 12 10 1
蜡烛 185 20 12 1 20 16 114 2 崩溃 53 19 12 1 1 2 18 0
可爱 179 1 10 2 82 49 30 5 话筒 52 3 8 0 9 18 14 0
177 25 53 3 16 6 59 15 愤怒 51 35 6 0 1 1 7 1
月亮 176 7 11 3 62 47 44 2 疑问 51 8 10 0 6 4 20 3
鼓掌 174 2 13 0 95 53 9 2 鲜花 49 1 0 0 22 16 9 1
委屈 167 13 19 4 9 8 112 2 闭嘴 46 7 7 1 2 2 23 4
奥特曼 158 9 21 1 49 45 28 5 神马 44 10 9 0 8 5 11 1
黑线 145 29 52 1 10 3 38 12 42 1 20 0 3 8 8 2
兔子 141 6 11 0 85 16 20 3 38 16 10 0 0 1 11 0
130 2 20 0 60 29 14 5 下雨 37 3 6 1 4 4 18 1
泪流满面 126 13 10 5 10 5 81 2 悲催 36 0 7 1 1 1 26 0
怒骂 119 51 28 0 3 2 32 3 干杯 35 1 0 0 21 8 5 0
太阳 112 6 2 0 67 17 17 3 抱抱 32 1 1 0 25 5 0 0
110 2 2 3 65 30 5 3 30 5 7 2 0 4 12 0

Table 4

The emotional polarity of 7 emotions and their emoji numbers"

情绪 情感极性 emoji数量
积极 27
积极 2
消极 5
消极 8
消极 21
消极 0
模糊 1

Fig.3

The emotional polarity of 64 emoji"

Fig.4

t-SNE visualization of 64 emoji vectors under different emoji representation methods"

Fig.5

Emotional consistency accuracy of different emoji representation methods"

Table 5

Emotion mapping results for 64 emoji"

模型 准确率/% 平均准确率/%
CWV 0.00 1.67 11.11 0.00 9.52 0.00 9.38
DSG 80.00 8.89 37.04 50.00 42.86 100.00 43.75
fastText 60.00 6.67 3.70 0.00 0.00 100.00 7.81
BERT-EDEER 80.00 8.89 96.30 50.00 90.48 100.00 82.81

Fig.6

Heat map of the association between 64 emoji and 7 emotions"

Fig.7

Heat map of the association between 7 emotions"

1 BIRJALI M , KASRI M , HSSANE A B . A comprehensive survey on sentiment analysis: approaches, challenges and trends[J]. Knowledge-Based Systems, 2021, 226, 107134.
doi: 10.1016/j.knosys.2021.107134
2 GUPTA S, SINGH A, RANJAN J. Sentiment analysis: usage of text and emoji for expressing sentiments[C]//Advances in Data and Information Sciences: Proceedings of ICDIS 2019. Singapore: Springer, 2020: 477-486.
3 LEE S , JEONG D , PARK E . MultiEmo: multi-task framework for emoji prediction[J]. Knowledge-Based Systems, 2022, 242, 108437.
doi: 10.1016/j.knosys.2022.108437
4 谭皓, 邓树文, 钱涛, 等. 基于表情符注意力机制的微博情感分析模型[J]. 计算机应用研究, 2019, 36 (9): 2647- 2650.
TAN Hao , DENG Shuwen , QIAN Tao , et al. A microblog sentiment analysis model based on emoji attention mechanism[J]. Application Research of Computers, 2019, 36 (9): 2647- 2650.
5 谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26 (1): 73- 84.
XIE Lixing , ZHOU Ming , SUN Maosong . Hierarchical structure based hybrid approach to sentiment analysis of Chinese microblog and its feature extraction[J]. Journal of Chinese Information Processing, 2012, 26 (1): 73- 84.
6 EISNER B, ROCKTÄ T, AUGENSTEIN I, et al. Emoji2vec: learning emoji representations from their description[C]//Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media. Stroudsburg: ACL, 2016: 48-54.
7 GROVER V . Exploiting emojis in sentiment analysis: a survey[J]. Journal of the Institution of Engineers (India): Series B, 2021, 103 (1): 1- 14.
8 WIJERATNE S, BALASURIYA L, SHETH A, et al. A semantics-based measure of emoji similarity[C]//Proceedings of the International Conference on Web Intelligence. New York: ACM, 2017: 646-653.
9 BARBIERI F, RONZANO F, SAGGION H. What does this emoji mean? a vector space skip-gram model for Twitter emojis[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Slovenia: ELRA, 2016: 3967-3972.
10 LI M, GUNTUKU S, JAKHETIYA V, et al. Exploring (dis-) similarities in emoji-emotion association on Twitter and Weibo[C]//Companion proceedings of the 2019 world wide web conference. New York: ACM, 2019: 461-467.
11 SHOEB A A M, DE MELO G. Emotag1200: understanding the association between emojis and emotions[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 8957-8967.
12 王文远, 王大玲, 冯时, 等. 一种面向情感分析的微博表情情感词典构建及应用[J]. 计算机与数字工程, 2012, 40 (11): 6- 9.
WANG Wenyuang , WANG Daling , FENG Shi , et al. A sentiment dictionary construction and application of microblog emoji sentiment dictionary for sentiment analysis[J]. Computer and Digital Engineering, 2012, 40 (11): 6- 9.
13 NOVAK P K , SMAILOVI Ć J , SLUBAN B , et al. Sentiment of emojis[J]. PLoS One, 2015, 10 (12): e0144296.
doi: 10.1371/journal.pone.0144296
14 LI D , RZEPKA R , PTASZYNSKI M , et al. HEMOS: a novel deep learning-based fine-grained humor detecting method for sentiment analysis of social media[J]. Information Processing & Management, 2020, 57 (6): 102290.
15 LI M, LONG Y, QIN L, et al. Emotion corpus construction based on selection from hashtags[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation. Slovenia: ELRA, 2016: 1845-1849.
16 何炎祥, 孙松涛, 牛菲菲, 等. 用于微博情感分析的一种情感语义增强的深度学习模型[J]. 计算机学报, 2017, 40 (4): 18.
HE Yanxiang , SUN Songtao , NIU Feifei , et al. A deep learning model enhanced with emotion semantics for microblog sentiment analysis[J]. Chinese Journal of Computers, 2017, 40 (4): 18.
17 FELBO B, MISLOVE A, S∅GAARD A, et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2017: 1615-1625.
18 SINGH A, BLANCO E, JIN W. Incorporating emoji descriptions improves tweet classification[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 2096-2101.
19 DIMSON T . Emojineering part 1: machine learning for emoji trends[J]. Instagram Engineering Blog, 2015, 30, 1- 10.
20 KIMURA M, KATSURAI M. Automatic construction of an emoji sentiment lexicon[C]//Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. New York: ACM, 2017: 1033-1036.
21 ZHOU Y, XUE H, GENG X. Emotion distribution recognition from facial expressions[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York: ACM, 2015: 1247-1250.
22 曾雪强, 罗明珠, 陈素芬, 等. 基于自适应多重多元回归的人脸年龄估计[J]. 江西师范大学学报(自然科学版), 2019, 43 (1): 68- 75.
ZENG Xueqiang , LUO Mingzhu , CHEN Sufen , et al. The facial age estimation based on adaptive multivariate multiple regression[J]. Journal of Jiangxi Normal University(Natural Sciences Edition), 2019, 43 (1): 68- 75.
23 ZHAO Z, MA X. Text emotion distribution learning from small sample: a meta-learning approach[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3955-3965.
24 ZHOU D, QUOST B, FRÉMONT V. Soft label based semi-supervised boosting for classification and object recognition[C]//2014 13th International Conference on Control Automation Robotics & Vision. Piscataway: IEEE, 2014: 1062-1067.
25 FAYEK H M, LECH M, CAVEDON L. Modeling subjectiveness in emotion recognition with deep neural networks: ensembles vs soft labels[C]//2016 International Joint Conference on Neural Networks. Piscataway: IEEE, 2016: 566-570.
26 ZHAO Z, WU S, YANG M, et al. Robust machine reading comprehension by learning soft labels[C]//Proceedings of the 28th International Conference on Computational Linguistics. Berlin: ICCL, 2020: 2754-2759.
27 FORNACIARI T, UMA A, PAUN S, et al. Beyond black & white: leveraging annotator disagreement via soft-label multi-task learning[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2591-2597.
28 WANG X, ZONG C. Distributed representations of emotion categories in emotion space[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2021: 2364-2375.
29 DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186.
30 姚源林, 王树伟, 徐睿峰, 等. 面向微博文本的情绪标注语料库构建[J]. 中文信息学报, 2014, 28 (5): 83- 91.
YAO Yuanlin , WANG Shuwei , XU Ruifeng , et al. The construction of an emotion annotated corpus on microblog text[J]. Journal of Chinese Information Processing, 2014, 28 (5): 83- 91.
31 LI S, ZHAO Z, HU R, et al. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 138-143.
32 DEMSZKY D, MOVSHOVITZ-ATTIAS D, KO J, et al. GoEmotions: a dataset of fine-grained emotions[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4040-4054.
33 KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2014: 1746-1751.
34 SCHUSTER M , PALIWAL K K . Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45 (11): 2673- 2681.
35 VAN DER MAATEN L , HINTON G . Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9 (11): 2579- 2605.
36 SONG Y, SHI S, LI J, et al. Directional skip-gram: explicitly distinguishing left and right context for word embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 175-180.
37 JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 427-431.
38 TANG D, WEI F, YANG N, et al. Learning sentiment-specific word embedding for Twitter sentiment classification[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2014: 1555-1565.
[1] Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.
[2] Jie WU,Xiao-fei ZHU,Yi-hao ZHANG,Jian-wu LONG,Xian-ying HUANG,Wu YANG. User sentiment tendency aware based Micro-blog sentiment analysis method [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 46-55.
[3] CHEN Xin, XUE Yun, LU Xin, LI Wan-li, ZHAO Hong-ya, HU Xiao-hui. Text feature extraction method for sentiment analysis based on order-preserving submatrix and frequent sequential pattern mining [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 36-45.
[4] YU Chuan-ming, FENG Bo-lin, TIAN Xin, AN Lu. Deep representative learning based sentiment analysis in the cross-lingual environment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 13-23.
[5] HE Yan-xiang, LIU Jian-bo, SUN Song-tao, WEN Wei-dong. Product reviews sentiment classification in Micro-blog based on cascaded conditional random field [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(11): 67-73.
[6] ZHU Zhu, LI Shou-shan, DAI Min, ZHOU Guo-dong. Opinion target extraction with active-learning and automatic annotation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 38-44.
[7] ZHOU Wen, ZHANG Shu-qing, OUYANG Chun-ping, LIU Zhi-ming, YANG Xiao-hua. Topic sentiment analysis of Chinese news based on emotional dependency tuple [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 1-6.
[8] LIU Ming, ZAN Hong-ying, YUAN Hui-bin. Key sentiment sentence prediction using SVM and RNN [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 68-73.
[9] ZHU Xi, DONG Xi-shuang, GUAN Yi, LIU Zhi-guang. Sentiment analysis of Chinese Micro-blog based on semi-supervised learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 37-42.
[10] SUN Song-tao, HE Yan-xiang, CAI Rui, LI Fei, HE Fei-yan. Comparative study of methods for Micro-blog sentiment evaluation tasks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 43-50.
[11] YANG Jia-neng, YANG Ai-min, ZHOU Yong-mei. Sentiment classification method of Chinese Micro-blog based on semantic analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 14-21.
[12] ZHANG Cheng-gong 1, 2, LIU Pei-yu1, 2*, ZHU Zhen-fang1,2, FANG Ming1,2. A sentiment analysis method based on a polarity lexicon [J]. J4, 2012, 47(3): 47-50.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LUO Si-te, LU Li-qian, CUI Ruo-fei, ZHOU Wei-wei, LI Zeng-yong*. Monte-Carlo simulation of photons transmission at alcohol wavelength in  skin tissue and design of fiber optic probe[J]. J4, 2013, 48(1): 46 -50 .
[2] ZHANG Ming-ming, QIN Yong-bin. A non-deterministic finite automata minimization method  based on preorder relation[J]. J4, 2010, 45(7): 34 -38 .
[3] SHAO Guo-jun, RU Miao-yan*, SUN Xue-ying. Study on synthesis process of polyether grafted polycarboxylate based superplasticizer[J]. J4, 2013, 48(05): 29 -33 .
[4] QU Xiao-ying ,ZHAO Jing . Solution of the Klein-Gordon equation for the time-dependent potential[J]. J4, 2007, 42(7): 22 -26 .
[5] WANG Guang-chen . LQ nonzero sum stochastic differential game under partial observable information[J]. J4, 2007, 42(6): 12 -15 .
[6] LI Ya-nan1, LIU Lei-po2, WANG Yu-guang3. Passive sliding mode control for uncertain time-delay systems subjected to input nonlinearity[J]. J4, 2010, 45(6): 99 -104 .
[7] ZHANG Sumei, MA Qiaoling, ZHAO Haixia. (d,1)Total labeling of the product of path and cycle graph[J]. J4, 2009, 44(4): 37 -42 .
[8] SU Qi,XIANG Kun and SUN Bin . The Shark-Search algorithm based on clustering links[J]. J4, 2006, 41(3): 1 -04 .
[9] JIN Li-ming,YANG Yan*,LIU Wan-shun,HAN Bao-qin,TIAN Wen-jie,FAN Sheng-di . Protective effects of chitosan oligosaccharide and its derivatives on carbon tetrachloride-induced liver damage in mice[J]. J4, 2007, 42(7): 1 -04 .
[10] ZHANG Dong-qing, YIN Xiao-bin, GAO Han-peng. Quasi-linearly Armendariz modules[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 1 -6 .