基于情感分布的emoji嵌入式表示

doi:10.6040/j.issn.1671-9352.1.2022.3548

Abstract

Abstract:

This paper proposes an emoji embedded representation based on emotion distribution (EDEER) method. The EDEER method adopts the soft label of BERT-based emotion prediction model to learn emoji embedded representation from real data, and directly models the expression degree of emoji on various sentiments through emotion distribution, so that the embedded representation contains various emotional information of emoji. Multiple sets of comparative experiments on the Chinese Weibo dataset containing emoji shows that the method proposed in this paper can effectively learn emoji embedded representations that are directly related to fine-grained sentiments, and build an emoji representation space with high emotional expression quality.

Key words: emoji, sentiment analysis, embedded representation, emotion distribution

CLC Number:

TP391

Xueqiang ZENG,Yu SUN,Ye LIU,Zhongying WAN,Jiali ZUO,Mingwen WANG. Emoji embedded representation based on emotion distribution[J].JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 81-94.

Figures/Tables 12

Table 1

Fig.1

Fig.2

Table 2

Table 3

Number of sentences labeled with 64 emoji on 7 emotions"

emoji	描述词	总句子数	7种情绪的句子标注数量							emoji	描述词	总句子数	7种情绪的句子标注数量
emoji	描述词	总句子数	怒	恶	惧	乐	爱	悲	惊	emoji	描述词	总句子数	怒	恶	惧	乐	爱	悲	惊
	泪	1 787	153	186	27	171	62	1 161	27		吃惊	109	14	22	3	9	2	17	42
	哈哈	629	8	42	4	457	67	38	13		鄙视	106	37	38	1	5	4	17	4
	抓狂	606	152	160	14	25	23	224	8		思考	105	8	30	2	20	15	19	11
	心	539	6	8	5	273	168	74	5		亲亲	104	1	6	0	57	27	12	1
	怒	459	273	80	3	9	4	82	8		睡觉	103	8	34	3	21	7	28	2
	嘻嘻	401	7	19	3	303	45	22	2		赞	98	4	2	0	20	70	2	0
	衰	352	53	80	10	13	3	168	25		浮云	96	8	17	0	21	9	39	2
	汗	309	53	108	6	25	8	80	29		笑哈哈	92	4	5	0	62	11	8	2
	拜拜	295	30	15	1	15	6	222	6		花心	87	2	6	0	42	30	2	5
	悲伤	293	19	21	6	17	5	224	1		馋嘴	85	3	14	0	49	9	9	1
	偷笑	276	8	28	2	172	33	26	7		威武	78	3	5	0	34	22	9	5
	伤心	267	22	22	4	11	10	196	2		微风	71	1	8	1	35	12	12	2
	哼	263	74	84	5	15	3	77	5		围观	64	3	8	0	28	9	14	2
	生病	258	30	53	11	8	6	147	3		吐	62	14	28	0	2	1	16	1
	呵呵	239	8	20	2	84	44	78	3		做鬼脸	61	2	8	1	32	10	7	1
	可怜	208	18	24	8	23	11	121	3		熊猫	61	4	4	1	12	14	21	5
	失望	206	12	27	8	6	11	139	3		蛋糕	57	0	0	0	28	24	4	1
	害羞	187	7	20	10	82	37	28	3		猪头	57	9	10	0	15	12	10	1
	蜡烛	185	20	12	1	20	16	114	2		崩溃	53	19	12	1	1	2	18	0
	可爱	179	1	10	2	82	49	30	5		话筒	52	3	8	0	9	18	14	0
	晕	177	25	53	3	16	6	59	15		愤怒	51	35	6	0	1	1	7	1
	月亮	176	7	11	3	62	47	44	2		疑问	51	8	10	0	6	4	20	3
	鼓掌	174	2	13	0	95	53	9	2		鲜花	49	1	0	0	22	16	9	1
	委屈	167	13	19	4	9	8	112	2		闭嘴	46	7	7	1	2	2	23	4
	奥特曼	158	9	21	1	49	45	28	5		神马	44	10	9	0	8	5	11	1
	黑线	145	29	52	1	10	3	38	12		困	42	1	20	0	3	8	8	2
	兔子	141	6	11	0	85	16	20	3		弱	38	16	10	0	0	1	11	0
	酷	130	2	20	0	60	29	14	5		下雨	37	3	6	1	4	4	18	1
	泪流满面	126	13	10	5	10	5	81	2		悲催	36	0	7	1	1	1	26	0
	怒骂	119	51	28	0	3	2	32	3		干杯	35	1	0	0	21	8	5	0
	太阳	112	6	2	0	67	17	17	3		抱抱	32	1	1	0	25	5	0	0
	耶	110	2	2	3	65	30	5	3		顶	30	5	7	2	0	4	12	0

Table 3

Table 4

Fig.3

Fig.4

Fig.5

Table 5

Fig.6

Fig.7

References 38

1	BIRJALI M , KASRI M , HSSANE A B . A comprehensive survey on sentiment analysis: approaches, challenges and trends[J]. Knowledge-Based Systems, 2021, 226, 107134. doi: 10.1016/j.knosys.2021.107134
2	GUPTA S, SINGH A, RANJAN J. Sentiment analysis: usage of text and emoji for expressing sentiments[C]//Advances in Data and Information Sciences: Proceedings of ICDIS 2019. Singapore: Springer, 2020: 477-486.
3	LEE S , JEONG D , PARK E . MultiEmo: multi-task framework for emoji prediction[J]. Knowledge-Based Systems, 2022, 242, 108437. doi: 10.1016/j.knosys.2022.108437
4	谭皓, 邓树文, 钱涛, 等. 基于表情符注意力机制的微博情感分析模型[J]. 计算机应用研究, 2019, 36 (9): 2647- 2650.
	TAN Hao , DENG Shuwen , QIAN Tao , et al. A microblog sentiment analysis model based on emoji attention mechanism[J]. Application Research of Computers, 2019, 36 (9): 2647- 2650.
5	谢丽星, 周明, 孙茂松. 基于层次结构的多策略中文微博情感分析和特征抽取[J]. 中文信息学报, 2012, 26 (1): 73- 84.
	XIE Lixing , ZHOU Ming , SUN Maosong . Hierarchical structure based hybrid approach to sentiment analysis of Chinese microblog and its feature extraction[J]. Journal of Chinese Information Processing, 2012, 26 (1): 73- 84.
6	EISNER B, ROCKTÄ T, AUGENSTEIN I, et al. Emoji2vec: learning emoji representations from their description[C]//Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media. Stroudsburg: ACL, 2016: 48-54.
7	GROVER V . Exploiting emojis in sentiment analysis: a survey[J]. Journal of the Institution of Engineers (India): Series B, 2021, 103 (1): 1- 14.
8	WIJERATNE S, BALASURIYA L, SHETH A, et al. A semantics-based measure of emoji similarity[C]//Proceedings of the International Conference on Web Intelligence. New York: ACM, 2017: 646-653.
9	BARBIERI F, RONZANO F, SAGGION H. What does this emoji mean? a vector space skip-gram model for Twitter emojis[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). Slovenia: ELRA, 2016: 3967-3972.
10	LI M, GUNTUKU S, JAKHETIYA V, et al. Exploring (dis-) similarities in emoji-emotion association on Twitter and Weibo[C]//Companion proceedings of the 2019 world wide web conference. New York: ACM, 2019: 461-467.
11	SHOEB A A M, DE MELO G. Emotag1200: understanding the association between emojis and emotions[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 8957-8967.
12	王文远, 王大玲, 冯时, 等. 一种面向情感分析的微博表情情感词典构建及应用[J]. 计算机与数字工程, 2012, 40 (11): 6- 9.
	WANG Wenyuang , WANG Daling , FENG Shi , et al. A sentiment dictionary construction and application of microblog emoji sentiment dictionary for sentiment analysis[J]. Computer and Digital Engineering, 2012, 40 (11): 6- 9.
13	NOVAK P K , SMAILOVI Ć J , SLUBAN B , et al. Sentiment of emojis[J]. PLoS One, 2015, 10 (12): e0144296. doi: 10.1371/journal.pone.0144296
14	LI D , RZEPKA R , PTASZYNSKI M , et al. HEMOS: a novel deep learning-based fine-grained humor detecting method for sentiment analysis of social media[J]. Information Processing & Management, 2020, 57 (6): 102290.
15	LI M, LONG Y, QIN L, et al. Emotion corpus construction based on selection from hashtags[C]//Proceedings of the Tenth International Conference on Language Resources and Evaluation. Slovenia: ELRA, 2016: 1845-1849.
16	何炎祥, 孙松涛, 牛菲菲, 等. 用于微博情感分析的一种情感语义增强的深度学习模型[J]. 计算机学报, 2017, 40 (4): 18.
	HE Yanxiang , SUN Songtao , NIU Feifei , et al. A deep learning model enhanced with emotion semantics for microblog sentiment analysis[J]. Chinese Journal of Computers, 2017, 40 (4): 18.
17	FELBO B, MISLOVE A, S∅GAARD A, et al. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2017: 1615-1625.
18	SINGH A, BLANCO E, JIN W. Incorporating emoji descriptions improves tweet classification[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 2096-2101.
19	DIMSON T . Emojineering part 1: machine learning for emoji trends[J]. Instagram Engineering Blog, 2015, 30, 1- 10.
20	KIMURA M, KATSURAI M. Automatic construction of an emoji sentiment lexicon[C]//Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. New York: ACM, 2017: 1033-1036.
21	ZHOU Y, XUE H, GENG X. Emotion distribution recognition from facial expressions[C]//Proceedings of the 23rd ACM International Conference on Multimedia. New York: ACM, 2015: 1247-1250.
22	曾雪强, 罗明珠, 陈素芬, 等. 基于自适应多重多元回归的人脸年龄估计[J]. 江西师范大学学报(自然科学版), 2019, 43 (1): 68- 75.
	ZENG Xueqiang , LUO Mingzhu , CHEN Sufen , et al. The facial age estimation based on adaptive multivariate multiple regression[J]. Journal of Jiangxi Normal University(Natural Sciences Edition), 2019, 43 (1): 68- 75.
23	ZHAO Z, MA X. Text emotion distribution learning from small sample: a meta-learning approach[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3955-3965.
24	ZHOU D, QUOST B, FRÉMONT V. Soft label based semi-supervised boosting for classification and object recognition[C]//2014 13th International Conference on Control Automation Robotics & Vision. Piscataway: IEEE, 2014: 1062-1067.
25	FAYEK H M, LECH M, CAVEDON L. Modeling subjectiveness in emotion recognition with deep neural networks: ensembles vs soft labels[C]//2016 International Joint Conference on Neural Networks. Piscataway: IEEE, 2016: 566-570.
26	ZHAO Z, WU S, YANG M, et al. Robust machine reading comprehension by learning soft labels[C]//Proceedings of the 28th International Conference on Computational Linguistics. Berlin: ICCL, 2020: 2754-2759.
27	FORNACIARI T, UMA A, PAUN S, et al. Beyond black & white: leveraging annotator disagreement via soft-label multi-task learning[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2591-2597.
28	WANG X, ZONG C. Distributed representations of emotion categories in emotion space[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2021: 2364-2375.
29	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186.
30	姚源林, 王树伟, 徐睿峰, 等. 面向微博文本的情绪标注语料库构建[J]. 中文信息学报, 2014, 28 (5): 83- 91.
	YAO Yuanlin , WANG Shuwei , XU Ruifeng , et al. The construction of an emotion annotated corpus on microblog text[J]. Journal of Chinese Information Processing, 2014, 28 (5): 83- 91.
31	LI S, ZHAO Z, HU R, et al. Analogical reasoning on Chinese morphological and semantic relations[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 138-143.
32	DEMSZKY D, MOVSHOVITZ-ATTIAS D, KO J, et al. GoEmotions: a dataset of fine-grained emotions[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 4040-4054.
33	KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2014: 1746-1751.
34	SCHUSTER M , PALIWAL K K . Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45 (11): 2673- 2681.
35	VAN DER MAATEN L , HINTON G . Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9 (11): 2579- 2605.
36	SONG Y, SHI S, LI J, et al. Directional skip-gram: explicitly distinguishing left and right context for word embeddings[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 175-180.
37	JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 427-431.
38	TANG D, WEI F, YANG N, et al. Learning sentiment-specific word embedding for Twitter sentiment classification[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2014: 1555-1565.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 10

[1]	LUO Si-te, LU Li-qian, CUI Ruo-fei, ZHOU Wei-wei, LI Zeng-yong*. Monte-Carlo simulation of photons transmission at alcohol wavelength in skin tissue and design of fiber optic probe[J]. J4, 2013, 48(1): 46 -50 .
[2]	ZHANG Ming-ming, QIN Yong-bin. A non-deterministic finite automata minimization method based on preorder relation[J]. J4, 2010, 45(7): 34 -38 .
[3]	SHAO Guo-jun, RU Miao-yan*, SUN Xue-ying. Study on synthesis process of polyether grafted polycarboxylate based superplasticizer[J]. J4, 2013, 48(05): 29 -33 .
[4]	QU Xiao-ying ,ZHAO Jing . Solution of the Klein-Gordon equation for the time-dependent potential[J]. J4, 2007, 42(7): 22 -26 .
[5]	WANG Guang-chen . LQ nonzero sum stochastic differential game under partial observable information[J]. J4, 2007, 42(6): 12 -15 .
[6]	LI Ya-nan1, LIU Lei-po2, WANG Yu-guang3. Passive sliding mode control for uncertain time-delay systems subjected to input nonlinearity[J]. J4, 2010, 45(6): 99 -104 .
[7]	ZHANG Sumei, MA Qiaoling, ZHAO Haixia. (d,1)Total labeling of the product of path and cycle graph[J]. J4, 2009, 44(4): 37 -42 .
[8]	SU Qi,XIANG Kun and SUN Bin . The Shark-Search algorithm based on clustering links[J]. J4, 2006, 41(3): 1 -04 .
[9]	JIN Li-ming,YANG Yan*,LIU Wan-shun,HAN Bao-qin,TIAN Wen-jie,FAN Sheng-di . Protective effects of chitosan oligosaccharide and its derivatives on carbon tetrachloride-induced liver damage in mice[J]. J4, 2007, 42(7): 1 -04 .
[10]	ZHANG Dong-qing, YIN Xiao-bin, GAO Han-peng. Quasi-linearly Armendariz modules[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(12): 1 -6 .

序号	示例	情绪
1	感谢一切，爱你们	乐
2	满满的正月味道，让我不禁思念远在故乡的亲人	悲
3	奋斗的人生才有意义，充实才叫人生	爱
4	真不知道要怎么和敷衍对话的人继续聊下去	恶

数据集	emoji数量	含emoji的句子数	总句子数
NLP&CC2013	110	1 509	10 487
NLP&CC2014	28	637	5 918
WEC	191	8 961	39 660
总计	262	11 107	56 065

模型	准确率/%						平均准确率/%
模型	怒	恶	乐	爱	悲	惊	平均准确率/%
CWV	0.00	1.67	11.11	0.00	9.52	0.00	9.38
DSG	80.00	8.89	37.04	50.00	42.86	100.00	43.75
fastText	60.00	6.67	3.70	0.00	0.00	100.00	7.81
BERT-EDEER	80.00	8.89	96.30	50.00	90.48	100.00	82.81

[1]	Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.
[2]	Jie WU,Xiao-fei ZHU,Yi-hao ZHANG,Jian-wu LONG,Xian-ying HUANG,Wu YANG. User sentiment tendency aware based Micro-blog sentiment analysis method [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 46-55.
[3]	CHEN Xin, XUE Yun, LU Xin, LI Wan-li, ZHAO Hong-ya, HU Xiao-hui. Text feature extraction method for sentiment analysis based on order-preserving submatrix and frequent sequential pattern mining [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 36-45.
[4]	YU Chuan-ming, FENG Bo-lin, TIAN Xin, AN Lu. Deep representative learning based sentiment analysis in the cross-lingual environment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 13-23.
[5]	HE Yan-xiang, LIU Jian-bo, SUN Song-tao, WEN Wei-dong. Product reviews sentiment classification in Micro-blog based on cascaded conditional random field [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(11): 67-73.
[6]	ZHU Zhu, LI Shou-shan, DAI Min, ZHOU Guo-dong. Opinion target extraction with active-learning and automatic annotation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 38-44.
[7]	ZHOU Wen, ZHANG Shu-qing, OUYANG Chun-ping, LIU Zhi-ming, YANG Xiao-hua. Topic sentiment analysis of Chinese news based on emotional dependency tuple [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 1-6.
[8]	LIU Ming, ZAN Hong-ying, YUAN Hui-bin. Key sentiment sentence prediction using SVM and RNN [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 68-73.
[9]	ZHU Xi, DONG Xi-shuang, GUAN Yi, LIU Zhi-guang. Sentiment analysis of Chinese Micro-blog based on semi-supervised learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 37-42.
[10]	SUN Song-tao, HE Yan-xiang, CAI Rui, LI Fei, HE Fei-yan. Comparative study of methods for Micro-blog sentiment evaluation tasks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 43-50.
[11]	YANG Jia-neng, YANG Ai-min, ZHOU Yong-mei. Sentiment classification method of Chinese Micro-blog based on semantic analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 14-21.
[12]	ZHANG Cheng-gong 1, 2, LIU Pei-yu1, 2*, ZHU Zhen-fang1,2, FANG Ming1,2. A sentiment analysis method based on a polarity lexicon [J]. J4, 2012, 47(3): 47-50.

Emoji embedded representation based on emotion distribution

RichHTML

PDF (PC)

Abstract

Cite this article

share this article

Figures/Tables 12

References 38

Related Articles 12

Metrics

Comments

Recommended 10

情绪	情感极性	emoji数量
乐	积极	27
爱	积极	2
怒	消极	5
恶	消极	8
悲	消极	21
惧	消极	0
惊	模糊	1