JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2024, Vol. 59 ›› Issue (7): 105-112.doi: 10.6040/j.issn.1671-9352.1.2023.055

• Review • Previous Articles     Next Articles

Multimodal conversation emotion recognition based on clustering and group normalization

Qi LUO1(),Gang GOU2,*()   

  1. 1. State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, Guizhou, China
    2. College of Computer Science and Technology, Guizhou University, Guiyang 550025, Guizhou, China
  • Received:2023-11-24 Online:2024-07-20 Published:2024-07-15
  • Contact: Gang GOU E-mail:gs.luoq21@gzu.edu.cn;ggou@gzu.edu.cn

Abstract:

It is a challenge for multimodal emotion recognition task that the confusion of similar emotion categories recognition leads to a decrease in recognition effect. To address this problem, a neural network modeling approach for relational graphs is proposed based on clustering group normalization. Firstly, three modal features are extracted using three different feature extractors and spliced by incorporating speaker encoding, which enriches the feature representation and preserves the original information. Secondly, contextual information is extracted using Transformer. Finally, after the feature nodes are input into the relational graph convolutional neural network, the nodes are clustered and grouped by clustering and independently normalized to make similar nodes more similar, which alleviates the problem that similar emotions are difficult to delimit. Through experimental validation, the network model can reach an 86.34% F1-score on the IEMOCAP dataset four classification, which verifies the effectiveness of the method in this paper. At present, the model achieves the best performance on this dataset.

Key words: graph neural network, feature fusion, group normalization, cluster, conversation emotion recognition

CLC Number: 

  • TP391

Fig.1

Architecture diagram of multimodal conversational emotion recognition model based on clustering and group normalization"

Fig.2

Diagram of feature extraction module"

Fig.3

DRG module diagram"

Table 1

IEMOCAP dataset"

IEMOCAP 训练集 验证集 测试集
对话数量(话语个数) 108(5 146) 12(664) 31(1 623)

Table 2

Results for different emotional categories  单位: %"

情绪类别 Precision Recall F1值
Happy 78.57 84.03 81.21
Sad 85.77 91.02 88.32
Neutral 91.62 82.55 86.85
Anger 83.61 90.00 86.85
Macro avg 84.89 86.90 85.76
Weighted avg 86.66 86.32 86.34

Table 3

Accuracy and F1-score on different models  单位: %"

模型 Accuracy F1值
bc-LSTM[18] 75.10 74.10
CHFusion[19] 76.80 76.50
MMGCN 78.26 78.66
MM-DFN[20] 79.64 79.60
PATHOSnet V2[21] 78.00 80.40
COGMEN 84.62 84.53
DRG(本文) 86.32 86.34

Table 4

Comparison of the F1-score results of the 2 models on the IEMOCAP dataset  单位: %"

模型 Happy Sad Neutral Anger Acc avg Weighted avg
COGMEN 81.41 88.00 82.79 86.13 84.62 84.53
DRG 81.21 88.32 86.85 86.69 86.32 86.34

Fig.4

Visualization of feature nodes for TSNE method"

Fig.5

Confusion matrix comparison of recognition results before and after using DRG module"

Table 5

Comparison of different modal results  单位: %"

模态 F1值
音频 64.42
视频 48.45
文本 82.81
视频+文本 83.12
音频+文本 86.05
视频+音频+文本 86.34
1 JORDANM I.Serial order: a parallel distributed processing approach[J].Advances in Psychology,1997,121,471-495.
2 KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2023-11-24]. http://arxiv.org/abs/1609.02907.
3 MAJUMDERN,PORIAS,HAZARIKAD,et al.DialogueRNN: an attentive RNN for emotion detection in conversations[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):6818-6825.
doi: 10.1609/aaai.v33i01.33016818
4 CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. (2014-12-11)[2023-11-24]. http://arxiv.org/abs/1412.3555.
5 GHOSAL D, MAJUMDER N, GELBUKH A, et al. COSMIC: commonsense knowledge for emotion identification in conversa- tions[EB/OL]. (2020-10-06)[2023-11-24]. http://arxiv.org/abs/2010.02795.
6 GHOSAL D, MAJUMDER N, PORIA S, et al. DialogueGCN: a graph convolutional neural network for emotion recognition in conversation[EB/OL]. (2019-08-30)[2023-11-24]. http://arxiv.org/abs/1908.11540.
7 HU Jingwen, LIU Yuchen, ZHAO Jinming, et al. MMGCN: multimodal fusion via deep graph convolution network for emotion recognition in conversation[EB/OL]. (2021-07-14)[2023-11-24]. http://arxiv.org/abs/2107.06779.
8 CHEN Ming, WEI Zhewei, HUANG Zengfeng, et al. Simple and deep graph convolutional networks[C]//International Conference on Machine Learning. [S. l. ]: ACM, 2020: 1725-1735.
9 ZHOU Kaixiong, HUANG Xiao, LI Yuening, et al. Towards deeper graph neural networks with differentiable group normalization[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2020: 4917-4928.
10 JOSHI A, BHAT A, JAIN A, et al. COGMEN: contextualized GNN based multimodal emotion recognition[EB/OL]. (2022-05-05)[2023-11-24]. http://arxiv.org/abs/2205.02455.
11 SCHLICHTKRULL M, KIPF T N, BLOEM P, et al. Modeling relational data with graph convolutional networks[M]//The Semantic Web. Cham: Springer, 2018: 593-607.
12 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017: 6000-6010.
13 BUSSOC,BULUTM,LEEC C,et al.IEMOCAP: interactive emotional dyadic motion capture database[J].Language Resources and Evaluation,2008,42(4):335-359.
doi: 10.1007/s10579-008-9076-6
14 EYBEN F, WÖLLMER M, SCHULLER B. Opensmile: the munich versatile and fast open-source audio feature extrac-tor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze: ACM, 2010: 1459-1462.
15 BALTRUŠAITIS T, ROBINSON P, MORENCY L P. OpenFace: an open source facial behavior analysis toolkit[C]//2016 IEEE Winter Conference on Applications of Computer Vision (WACV). New York: IEEE, 2016: 1-10.
16 REIMERS N, GUREVYCH I. Sentence-BERT: sentence embeddings using siamese BERT-networks[EB/OL]. (2019-08-27)[2023-12-24]. http://arxiv.org/abs/1908.10084.
17 CAID,LAMW.Graph transformer for graph-to-sequence learning[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(5):7464-7471.
doi: 10.1609/aaai.v34i05.6243
18 PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Strouds-burg: Association for Computational Linguistics, 2017: 873-883.
19 MAJUMDERN,HAZARIKAD,GELBUKHA,et al.Multimodal sentiment analysis using hierarchical fusion with context modeling[J].Knowledge-based Systems,2018,161,124-133.
doi: 10.1016/j.knosys.2018.07.041
20 HU Dou, HOU Xiaolong, WEI Lingwei, et al. MM-DFN: multimodal dynamic fusion network for emotion recognition in conversations[C]//ICASSP 2022: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore: IEEE, 2022: 7037-7041.
21 SCOTTI V, GALATI F, SBATTELLA L, et al. Combining deep and unsupervised features for multilingual speech emotion recognition[C]//Pattern Recognition: ICPR International Workshops and Challenges. Cham: Springer, 2021: 114-128.
22 VAN DER MAATENL,HINTONG.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(86):2579-2605.
[1] Xingyu HUANG,Mingyu ZHAO,Ziyu LYU. Category-wise knowledge probers for representation learning of graph neural networks [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 85-94.
[2] ZHENG Chenying, CHEN Yingyue, HOU Xianyu, JIANG Lianji, LIAO Liang. A neighbourhood granular fuzzy C-means clustering algorithm [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 35-44.
[3] ZHU Jin, FU Yu, GUAN Wenrui, WANG Pingxin. Perturbation three-way clustering based on natural nearest neighbors [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(5): 45-51.
[4] Jiarui SUN,Mingjing DU. Fuzzy border-peeling clustering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 27-36, 50.
[5] Huachang XU,Qian XU,Yulin ZHAO,Fengning LIANG,Kai XU,Hong ZHU. Prediction method of IDH1 mutation status of glioma based on improved EfficientNetV2 [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(7): 60-66.
[6] Xin JIN,Feifan YU,Yutong DAI,Ziqian LI,Yongkui ZOU. Analysis of evaluation model on teaching effect based on cluster analysis and discrimination information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(7): 115-120.
[7] Xinsheng WANG,Xiaofei ZHU,Chenghong LI. Label guided multi-scale graph neural network for protein-protein interactions prediction [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 22-30.
[8] Hui MA,Lili WEI. Cluster analysis based on the hesitation triangle fuzzy correlation coefficient [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 118-126.
[9] FAN Jia-chen, WANG Ping-xin, YANG Xi-bei. Density-sensitive spectral clustering based on three-way decision [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(1): 59-66.
[10] LI Xin-yu, FAN Hui, LIU Jing-lei. Robust clustering based on adaptive graph regularization and low-rank matrix decomposition [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(8): 21-38.
[11] LIU Li-fang, MA Yuan-yuan. Cross-modal information retrieval method based on multi-view symmetric nonnegative matrix factorization [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(7): 65-72.
[12] SUN Lin, LIANG Na, XU Jiu-cheng. Feature selection using adaptive neighborhood mutual information and spectral clustering [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 13-24.
[13] GAO Xin-zhao, XIE Yun-li. Linear independence of cluster monomials of cluster algebras of type G2 via categorification [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(10): 34-38.
[14] WU Qi-ran, ZHOU Li-kai, SUN Jin-jin, WANG Nian-ge, YU Qun-fang. Research on characteristics of air quality change in Zhejiang Province——based on functional data analysis [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(7): 53-64.
[15] ZHANG Bin-yan, ZHU Xiao-fei, XIAO Zhao-hui, HUANG Xian-ying, WU Jie. Short text classification based on semi-supervised graph neural network [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(5): 57-65.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Yong-ming1, DING Li-wang2. The r-th moment consistency of estimators for a semi-parametric regression model for positively associated errors[J]. J4, 2013, 48(1): 83 -88 .
[2] DONG Li-hong1,2, GUO Shuang-jian1. The fundamental theorem for weak Hopf module in  Yetter-Drinfeld module categories[J]. J4, 2013, 48(2): 20 -22 .
[3] CHENG Li-qing1,2, SHI Qiao-lian2. A new hybrid conjugate gradient method[J]. J4, 2010, 45(6): 81 -85 .
[4] WANG Kang, LI Hua. Analysis of the compound Haqing injection with hyphenated chromatography and chemometric resolution[J]. J4, 2009, 44(11): 16 -20 .
[5] CHEN Li, . Singular LQ suboptimal control problem with disturbance rejection[J]. J4, 2006, 41(2): 74 -77 .
[6] HUO Yu-hong, JI Quan-bao. Synchronization analysis of oscillatory activities in a biological cell system[J]. J4, 2010, 45(6): 105 -110 .
[7] SHI Chang-guang . Multi-soliton solution of the Faddeev model[J]. J4, 2007, 42(7): 38 -40 .
[8] MA Jie-xiong,JIANG Li,QI Yu-yu,XIANG FEng-ning,XIA Guang-min . The growth of calli and regenerated plantlets of Gentiana Przewalskii Maxim. and the constituents analysis of its two effective components[J]. J4, 2006, 41(6): 157 -160 .
[9] XIE Tao, ZUO Ke-zheng. [J]. J4, 2013, 48(4): 95 -103 .
[10] WANG Deliang, GU Jiaofeng, HE Ping. [J]. J4, 2009, 44(3): 17 -21 .