您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (7): 105-112.doi: 10.6040/j.issn.1671-9352.1.2023.055

• 综述 • 上一篇    下一篇

基于聚类和群组归一化的多模态对话情绪识别

罗奇1(),苟刚2,*()   

  1. 1. 贵州大学公共大数据国家重点实验室, 贵州 贵阳 550025
    2. 贵州大学计算机科学与技术学院, 贵州 贵阳 550025
  • 收稿日期:2023-11-24 出版日期:2024-07-20 发布日期:2024-07-15
  • 通讯作者: 苟刚 E-mail:gs.luoq21@gzu.edu.cn;ggou@gzu.edu.cn
  • 作者简介:罗奇(1999—), 男, 硕士研究生, 研究方向为自然语言处理与情感计算. E-mail: gs.luoq21@gzu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62162010);贵州省科技支撑计划资助项目(黔科合支撑[2022]一般267)

Multimodal conversation emotion recognition based on clustering and group normalization

Qi LUO1(),Gang GOU2,*()   

  1. 1. State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, Guizhou, China
    2. College of Computer Science and Technology, Guizhou University, Guiyang 550025, Guizhou, China
  • Received:2023-11-24 Online:2024-07-20 Published:2024-07-15
  • Contact: Gang GOU E-mail:gs.luoq21@gzu.edu.cn;ggou@gzu.edu.cn

摘要:

相似情绪类别识别混乱导致识别效果下降的问题一直是多模态情绪识别任务的一大挑战。针对此问题, 提出一个基于聚类群组归一化的关系图神经网络模型方法。首先使用3个不同特征提取器提取出3种模态特征, 并融入说话者编码后进行拼接, 既丰富特征表示又保留原始信息; 其次使用Transformer提取上下文信息; 最后将特征节点输入关系图卷积神经网络后, 通过对节点进行聚类分组, 并独立地进行群组归一化, 使相似节点更加相似, 缓解相似情绪容易识别混乱的问题。通过实验验证, 提出的网络模型在IEMOCAP数据集四分类上的F1值可达到86.34%, 验证该方法的有效性, 并且目前该模型达到IEMOCAP数据集的最佳性能。

关键词: 图神经网络, 特征融合, 群组归一化, 聚类, 对话情绪识别

Abstract:

It is a challenge for multimodal emotion recognition task that the confusion of similar emotion categories recognition leads to a decrease in recognition effect. To address this problem, a neural network modeling approach for relational graphs is proposed based on clustering group normalization. Firstly, three modal features are extracted using three different feature extractors and spliced by incorporating speaker encoding, which enriches the feature representation and preserves the original information. Secondly, contextual information is extracted using Transformer. Finally, after the feature nodes are input into the relational graph convolutional neural network, the nodes are clustered and grouped by clustering and independently normalized to make similar nodes more similar, which alleviates the problem that similar emotions are difficult to delimit. Through experimental validation, the network model can reach an 86.34% F1-score on the IEMOCAP dataset four classification, which verifies the effectiveness of the method in this paper. At present, the model achieves the best performance on this dataset.

Key words: graph neural network, feature fusion, group normalization, cluster, conversation emotion recognition

中图分类号: 

  • TP391

图1

基于聚类和群组归一化的多模态对话情绪识别模型架构图"

图2

特征提取模块图"

图3

DRG模块图"

表1

IEMOCAP数据集"

IEMOCAP 训练集 验证集 测试集
对话数量(话语个数) 108(5 146) 12(664) 31(1 623)

表2

不同情绪类别的结果"

情绪类别 Precision Recall F1值
Happy 78.57 84.03 81.21
Sad 85.77 91.02 88.32
Neutral 91.62 82.55 86.85
Anger 83.61 90.00 86.85
Macro avg 84.89 86.90 85.76
Weighted avg 86.66 86.32 86.34

表3

不同模型上的准确率和F1值"

模型 Accuracy F1值
bc-LSTM[18] 75.10 74.10
CHFusion[19] 76.80 76.50
MMGCN 78.26 78.66
MM-DFN[20] 79.64 79.60
PATHOSnet V2[21] 78.00 80.40
COGMEN 84.62 84.53
DRG(本文) 86.32 86.34

表4

2种模型在IEMOCAP数据集上的F1值对比"

模型 Happy Sad Neutral Anger Acc avg Weighted avg
COGMEN 81.41 88.00 82.79 86.13 84.62 84.53
DRG 81.21 88.32 86.85 86.69 86.32 86.34

图4

TSNE方法的特征节点可视化"

图5

使用DRG模块前后识别结果的混淆矩阵对比"

表5

不同模态结果对比"

模态 F1值
音频 64.42
视频 48.45
文本 82.81
视频+文本 83.12
音频+文本 86.05
视频+音频+文本 86.34
1 JORDANM I.Serial order: a parallel distributed processing approach[J].Advances in Psychology,1997,121,471-495.
2 KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2023-11-24]. http://arxiv.org/abs/1609.02907.
3 MAJUMDERN,PORIAS,HAZARIKAD,et al.DialogueRNN: an attentive RNN for emotion detection in conversations[J].Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):6818-6825.
doi: 10.1609/aaai.v33i01.33016818
4 CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. (2014-12-11)[2023-11-24]. http://arxiv.org/abs/1412.3555.
5 GHOSAL D, MAJUMDER N, GELBUKH A, et al. COSMIC: commonsense knowledge for emotion identification in conversa- tions[EB/OL]. (2020-10-06)[2023-11-24]. http://arxiv.org/abs/2010.02795.
6 GHOSAL D, MAJUMDER N, PORIA S, et al. DialogueGCN: a graph convolutional neural network for emotion recognition in conversation[EB/OL]. (2019-08-30)[2023-11-24]. http://arxiv.org/abs/1908.11540.
7 HU Jingwen, LIU Yuchen, ZHAO Jinming, et al. MMGCN: multimodal fusion via deep graph convolution network for emotion recognition in conversation[EB/OL]. (2021-07-14)[2023-11-24]. http://arxiv.org/abs/2107.06779.
8 CHEN Ming, WEI Zhewei, HUANG Zengfeng, et al. Simple and deep graph convolutional networks[C]//International Conference on Machine Learning. [S. l. ]: ACM, 2020: 1725-1735.
9 ZHOU Kaixiong, HUANG Xiao, LI Yuening, et al. Towards deeper graph neural networks with differentiable group normalization[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2020: 4917-4928.
10 JOSHI A, BHAT A, JAIN A, et al. COGMEN: contextualized GNN based multimodal emotion recognition[EB/OL]. (2022-05-05)[2023-11-24]. http://arxiv.org/abs/2205.02455.
11 SCHLICHTKRULL M, KIPF T N, BLOEM P, et al. Modeling relational data with graph convolutional networks[M]//The Semantic Web. Cham: Springer, 2018: 593-607.
12 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017: 6000-6010.
13 BUSSOC,BULUTM,LEEC C,et al.IEMOCAP: interactive emotional dyadic motion capture database[J].Language Resources and Evaluation,2008,42(4):335-359.
doi: 10.1007/s10579-008-9076-6
14 EYBEN F, WÖLLMER M, SCHULLER B. Opensmile: the munich versatile and fast open-source audio feature extrac-tor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze: ACM, 2010: 1459-1462.
15 BALTRUŠAITIS T, ROBINSON P, MORENCY L P. OpenFace: an open source facial behavior analysis toolkit[C]//2016 IEEE Winter Conference on Applications of Computer Vision (WACV). New York: IEEE, 2016: 1-10.
16 REIMERS N, GUREVYCH I. Sentence-BERT: sentence embeddings using siamese BERT-networks[EB/OL]. (2019-08-27)[2023-12-24]. http://arxiv.org/abs/1908.10084.
17 CAID,LAMW.Graph transformer for graph-to-sequence learning[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(5):7464-7471.
doi: 10.1609/aaai.v34i05.6243
18 PORIA S, CAMBRIA E, HAZARIKA D, et al. Context-dependent sentiment analysis in user-generated videos[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Strouds-burg: Association for Computational Linguistics, 2017: 873-883.
19 MAJUMDERN,HAZARIKAD,GELBUKHA,et al.Multimodal sentiment analysis using hierarchical fusion with context modeling[J].Knowledge-based Systems,2018,161,124-133.
doi: 10.1016/j.knosys.2018.07.041
20 HU Dou, HOU Xiaolong, WEI Lingwei, et al. MM-DFN: multimodal dynamic fusion network for emotion recognition in conversations[C]//ICASSP 2022: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore: IEEE, 2022: 7037-7041.
21 SCOTTI V, GALATI F, SBATTELLA L, et al. Combining deep and unsupervised features for multilingual speech emotion recognition[C]//Pattern Recognition: ICPR International Workshops and Challenges. Cham: Springer, 2021: 114-128.
22 VAN DER MAATENL,HINTONG.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(86):2579-2605.
[1] 黄兴宇,赵明宇,吕子钰. 面向图神经网络表征学习的类别知识探针[J]. 《山东大学学报(理学版)》, 2024, 59(7): 85-94.
[2] 郑晨颖,陈颖悦,侯贤宇,江连吉,廖亮. 一种邻域粒的模糊C均值聚类算法[J]. 《山东大学学报(理学版)》, 2024, 59(5): 35-44.
[3] 朱金,付玉,管文瑞,王平心. 基于自然最近邻的样本扰动三支聚类[J]. 《山东大学学报(理学版)》, 2024, 59(5): 45-51.
[4] 孙嘉睿,杜明晶. 模糊边界剥离聚类[J]. 《山东大学学报(理学版)》, 2024, 59(3): 27-36, 50.
[5] 徐华畅,许倩,赵钰琳,梁峰宁,徐凯,朱红. 基于改进EfficientNetV2的脑胶质瘤IDH1突变状态预测方法[J]. 《山东大学学报(理学版)》, 2023, 58(7): 60-66.
[6] 金鑫,于非凡,戴雨桐,李兹谦,邹永魁. 基于聚类分析和鉴别信息的教学效果评价模型分析[J]. 《山东大学学报(理学版)》, 2023, 58(7): 115-120.
[7] 王新生,朱小飞,李程鸿. 标签指导的多尺度图神经网络蛋白质作用关系预测方法[J]. 《山东大学学报(理学版)》, 2023, 58(12): 22-30.
[8] 马慧,魏立力. 基于犹豫三角模糊相关系数的聚类分析[J]. 《山东大学学报(理学版)》, 2023, 58(12): 118-126.
[9] 凡嘉琛,王平心,杨习贝. 基于三支决策的密度敏感谱聚类[J]. 《山东大学学报(理学版)》, 2023, 58(1): 59-66.
[10] 李心雨,范辉,刘惊雷. 基于自适应图调节和低秩矩阵分解的鲁棒聚类[J]. 《山东大学学报(理学版)》, 2022, 57(8): 21-38.
[11] 柳利芳,马园园. 基于多视角对称非负矩阵分解的跨模态信息检索方法[J]. 《山东大学学报(理学版)》, 2022, 57(7): 65-72.
[12] 孙林,梁娜,徐久成. 基于自适应邻域互信息与谱聚类的特征选择[J]. 《山东大学学报(理学版)》, 2022, 57(12): 13-24.
[13] 武祺然,周力凯,孙金金,王念鸽,余群芳. 浙江省空气质量变化特征研究——基于函数型数据分析[J]. 《山东大学学报(理学版)》, 2021, 56(7): 53-64.
[14] 张斌艳,朱小飞,肖朝晖,黄贤英,吴洁. 基于半监督图神经网络的短文本分类[J]. 《山东大学学报(理学版)》, 2021, 56(5): 57-65.
[15] 杨婷,朱恒东,马盈仓,汪义瑞,杨小飞. 基于L2,1范数和流形正则项的半监督谱聚类算法[J]. 《山东大学学报(理学版)》, 2021, 56(3): 67-76.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李永明1, 丁立旺2. PA误差下半参数回归模型估计的r-阶矩相合[J]. J4, 2013, 48(1): 83 -88 .
[2] 董丽红1,2,郭双建1. Yetter-Drinfeld模范畴上的弱Hopf模基本定理[J]. J4, 2013, 48(2): 20 -22 .
[3] 程李晴1,2, 石巧连2. 一种新的混合共轭梯度算法[J]. J4, 2010, 45(6): 81 -85 .
[4] 王康 李华. 化学计量学方法用于蛤青注射色谱数据重叠峰的分辨[J]. J4, 2009, 44(11): 16 -20 .
[5] 陈 莉, . 非方广义系统带干扰抑制的奇异LQ次优控制问题[J]. J4, 2006, 41(2): 74 -77 .
[6] 霍玉洪,季全宝. 一类生物细胞系统钙离子振荡行为的同步研究[J]. J4, 2010, 45(6): 105 -110 .
[7] 石长光 . Faddeev模型中的多孤立子解[J]. J4, 2007, 42(7): 38 -40 .
[8] 马继雄,江莉,祁驭矜,向凤宁,夏光敏 . 祁连龙胆愈伤组织和再生植株的生长及其两种药效成分分析[J]. J4, 2006, 41(6): 157 -160 .
[9] 谢涛,左可正. 关于两个幂等算子组合的Drazin逆的若干探讨[J]. J4, 2013, 48(4): 95 -103 .
[10] 王德良,辜娇峰, 何平 . 八大公山红腹角雉对植被因素选择的分析[J]. J4, 2009, 44(3): 17 -21 .