基于聚类和群组归一化的多模态对话情绪识别

doi:10.6040/j.issn.1671-9352.1.2023.055

摘要/Abstract

摘要：

相似情绪类别识别混乱导致识别效果下降的问题一直是多模态情绪识别任务的一大挑战。针对此问题, 提出一个基于聚类群组归一化的关系图神经网络模型方法。首先使用3个不同特征提取器提取出3种模态特征, 并融入说话者编码后进行拼接, 既丰富特征表示又保留原始信息; 其次使用Transformer提取上下文信息; 最后将特征节点输入关系图卷积神经网络后, 通过对节点进行聚类分组, 并独立地进行群组归一化, 使相似节点更加相似, 缓解相似情绪容易识别混乱的问题。通过实验验证, 提出的网络模型在IEMOCAP数据集四分类上的F1值可达到86.34%, 验证该方法的有效性, 并且目前该模型达到IEMOCAP数据集的最佳性能。

关键词: 图神经网络, 特征融合, 群组归一化, 聚类, 对话情绪识别

Abstract:

It is a challenge for multimodal emotion recognition task that the confusion of similar emotion categories recognition leads to a decrease in recognition effect. To address this problem, a neural network modeling approach for relational graphs is proposed based on clustering group normalization. Firstly, three modal features are extracted using three different feature extractors and spliced by incorporating speaker encoding, which enriches the feature representation and preserves the original information. Secondly, contextual information is extracted using Transformer. Finally, after the feature nodes are input into the relational graph convolutional neural network, the nodes are clustered and grouped by clustering and independently normalized to make similar nodes more similar, which alleviates the problem that similar emotions are difficult to delimit. Through experimental validation, the network model can reach an 86.34% F1-score on the IEMOCAP dataset four classification, which verifies the effectiveness of the method in this paper. At present, the model achieves the best performance on this dataset.

Key words: graph neural network, feature fusion, group normalization, cluster, conversation emotion recognition

中图分类号:

TP391

罗奇,苟刚. 基于聚类和群组归一化的多模态对话情绪识别[J]. 《山东大学学报(理学版)》, 2024, 59(7): 105-112.

Qi LUO,Gang GOU. Multimodal conversation emotion recognition based on clustering and group normalization[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(7): 105-112.

图/表 10

图1

图2

图3

表1

表2

表3

表4

图4

图5

表5

参考文献 22

1	JORDANM I.Serial order: a parallel distributed processing approach[J].Advances in Psychology,1997,121,471-495.
2	KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2023-11-24]. http://arxiv.org/abs/1609.02907.
3	MAJUMDERN,PORIAS,HAZARIKAD,et al.DialogueRNN: an attentive RNN for emotion detection in conversations[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):6818-6825. doi: 10.1609/aaai.v33i01.33016818
4	CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. (2014-12-11)[2023-11-24]. http://arxiv.org/abs/1412.3555.
5	GHOSAL D, MAJUMDER N, GELBUKH A, et al. COSMIC: commonsense knowledge for emotion identification in conversa- tions[EB/OL]. (2020-10-06)[2023-11-24]. http://arxiv.org/abs/2010.02795.
6	GHOSAL D, MAJUMDER N, PORIA S, et al. DialogueGCN: a graph convolutional neural network for emotion recognition in conversation[EB/OL]. (2019-08-30)[2023-11-24]. http://arxiv.org/abs/1908.11540.
7	HU Jingwen, LIU Yuchen, ZHAO Jinming, et al. MMGCN: multimodal fusion via deep graph convolution network for emotion recognition in conversation[EB/OL]. (2021-07-14)[2023-11-24]. http://arxiv.org/abs/2107.06779.
8	CHEN Ming, WEI Zhewei, HUANG Zengfeng, et al. Simple and deep graph convolutional networks[C]//International Conference on Machine Learning. [S. l. ]: ACM, 2020: 1725-1735.
9	ZHOU Kaixiong, HUANG Xiao, LI Yuening, et al. Towards deeper graph neural networks with differentiable group normalization[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2020: 4917-4928.
10	JOSHI A, BHAT A, JAIN A, et al. COGMEN: contextualized GNN based multimodal emotion recognition[EB/OL]. (2022-05-05)[2023-11-24]. http://arxiv.org/abs/2205.02455.
11	SCHLICHTKRULL M, KIPF T N, BLOEM P, et al. Modeling relational data with graph convolutional networks[M]//The Semantic Web. Cham: Springer, 2018: 593-607.
12	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017: 6000-6010.
13	BUSSOC,BULUTM,LEEC C,et al.IEMOCAP: interactive emotional dyadic motion capture database[J].Language Resources and Evaluation,2008,42(4):335-359. doi: 10.1007/s10579-008-9076-6
14	EYBEN F, WÖLLMER M, SCHULLER B. Opensmile: the munich versatile and fast open-source audio feature extrac-tor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze: ACM, 2010: 1459-1462.
15	BALTRUŠAITIS T, ROBINSON P, MORENCY L P. OpenFace: an open source facial behavior analysis toolkit[C]//2016 IEEE Winter Conference on Applications of Computer Vision (WACV). New York: IEEE, 2016: 1-10.
16	REIMERS N, GUREVYCH I. Sentence-BERT: sentence embeddings using siamese BERT-networks[EB/OL]. (2019-08-27)[2023-12-24]. http://arxiv.org/abs/1908.10084.
17	CAID,LAMW.Graph transformer for graph-to-sequence learning[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(5):7464-7471. doi: 10.1609/aaai.v34i05.6243
18	PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Strouds-burg: Association for Computational Linguistics, 2017: 873-883.
19	MAJUMDERN,HAZARIKAD,GELBUKHA,et al.Multimodal sentiment analysis using hierarchical fusion with context modeling[J].Knowledge-based Systems,2018,161,124-133. doi: 10.1016/j.knosys.2018.07.041
20	HU Dou, HOU Xiaolong, WEI Lingwei, et al. MM-DFN: multimodal dynamic fusion network for emotion recognition in conversations[C]//ICASSP 2022: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore: IEEE, 2022: 7037-7041.
21	SCOTTI V, GALATI F, SBATTELLA L, et al. Combining deep and unsupervised features for multilingual speech emotion recognition[C]//Pattern Recognition: ICPR International Workshops and Challenges. Cham: Springer, 2021: 114-128.
22	VAN DER MAATENL,HINTONG.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(86):2579-2605.

相关文章 15

[1]	张晓媛, 田毅, 任子涵, 段天宇, 杨斯媛, 张月轩. 拓扑邻域基在密度聚类算法中的应用[J]. 《山东大学学报(理学版)》, 2026, 61(5): 55-64.
[2]	陈忠源,路翀. 基于自注意力机制的中心距差异多模态情感分析[J]. 《山东大学学报(理学版)》, 2026, 61(3): 86-95.
[3]	王顺霞,黄成泉,蔡江海,杨贵燕,罗森艳,周丽华. 直觉模糊局部保持投影最小二乘双支持向量聚类[J]. 《山东大学学报(理学版)》, 2026, 61(3): 124-134.
[4]	仲尚,马丽,刘文哲,李雨豪. 融合多尺度注意力机制和改进特征融合的轻量化水面小目标检测模型[J]. 《山东大学学报(理学版)》, 2026, 61(1): 15-25.
[5]	李文焱,李丽红,王洪欣. 基于知识度量的模糊粗糙c-均值算法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 49-64.
[6]	孙清,叶军,曾广财,宋苏洋,汪一心. 结合蝙蝠算法和紧密度改进的三支K-means算法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 65-75.
[7]	吴辛尧,徐计. 基于图互信息池化的分层图表示学习[J]. 《山东大学学报(理学版)》, 2025, 60(7): 84-93.
[8]	陈俊芬,李娜娜,谢博鋆,张杰. 双注意力引导特征融合的半弱监督目标检测[J]. 《山东大学学报(理学版)》, 2025, 60(1): 1-13.
[9]	张春昊,解滨,徐童童,张喜梅. 基于自然邻居搜索优化策略的密度峰值聚类算法[J]. 《山东大学学报(理学版)》, 2025, 60(1): 29-44.
[10]	国栋凯,张钦然,李小南,易黄建. 基于新型阴影集的模糊C均值聚类算法[J]. 《山东大学学报(理学版)》, 2025, 60(1): 74-82.
[11]	黄兴宇,赵明宇,吕子钰. 面向图神经网络表征学习的类别知识探针[J]. 《山东大学学报(理学版)》, 2024, 59(7): 85-94.
[12]	郑晨颖,陈颖悦,侯贤宇,江连吉,廖亮. 一种邻域粒的模糊C均值聚类算法[J]. 《山东大学学报(理学版)》, 2024, 59(5): 35-44.
[13]	朱金,付玉,管文瑞,王平心. 基于自然最近邻的样本扰动三支聚类[J]. 《山东大学学报(理学版)》, 2024, 59(5): 45-51.
[14]	孙嘉睿,杜明晶. 模糊边界剥离聚类[J]. 《山东大学学报(理学版)》, 2024, 59(3): 27-36, 50.
[15]	徐华畅,许倩,赵钰琳,梁峰宁,徐凯,朱红. 基于改进EfficientNetV2的脑胶质瘤IDH1突变状态预测方法[J]. 《山东大学学报(理学版)》, 2023, 58(7): 60-66.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

IEMOCAP	训练集	验证集	测试集
对话数量(话语个数)	108(5 146)	12(664)	31(1 623)

情绪类别	Precision	Recall	F1值
Happy	78.57	84.03	81.21
Sad	85.77	91.02	88.32
Neutral	91.62	82.55	86.85
Anger	83.61	90.00	86.85
Macro avg	84.89	86.90	85.76
Weighted avg	86.66	86.32	86.34

模型	Accuracy	F1值
bc-LSTM^[18]	75.10	74.10
CHFusion^[19]	76.80	76.50
MMGCN	78.26	78.66
MM-DFN^[20]	79.64	79.60
PATHOSnet V2^[21]	78.00	80.40
COGMEN	84.62	84.53
DRG(本文)	86.32	86.34

模型	Happy	Sad	Neutral	Anger	Acc avg	Weighted avg
COGMEN	81.41	88.00	82.79	86.13	84.62	84.53
DRG	81.21	88.32	86.85	86.69	86.32	86.34

模态	F1值
音频	64.42
视频	48.45
文本	82.81
视频+文本	83.12
音频+文本	86.05
视频+音频+文本	86.34