您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2023, Vol. 58 ›› Issue (12): 31-40, 51.doi: 10.6040/j.issn.1671-9352.1.2022.421

•   • 上一篇    下一篇

基于文本指导的层级自适应融合的多模态情感分析

卢婵1,2(),郭军军1,2,*(),谭凯文1,2,相艳1,2,余正涛1,2   

  1. 1. 昆明理工大学信息工程与自动化学院, 云南 昆明 650500
    2. 昆明理工大学云南省人工智能重点实验室, 云南 昆明 650500
  • 收稿日期:2022-09-29 出版日期:2023-12-20 发布日期:2023-12-19
  • 通讯作者: 郭军军 E-mail:904943362@qq.com;guojjgb@163.com
  • 作者简介:卢婵(1997—), 女, 硕士研究生, 研究方向为自然语言处理、多模态情感分析. E-mail: 904943362@qq.com
  • 基金资助:
    国家重点研发计划资助项目(2020AAA0107904);国家自然科学基金资助项目(62366025);国家自然科学基金资助项目(62241604);云南省科技厅基础研究专项面上项目(202301AT070444)

Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion

Chan LU1,2(),Junjun GUO1,2,*(),Kaiwen TAN1,2,Yan XIANG1,2,Zhengtao YU1,2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
    2. Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Received:2022-09-29 Online:2023-12-20 Published:2023-12-19
  • Contact: Junjun GUO E-mail:904943362@qq.com;guojjgb@163.com

摘要:

提出了一种基于文本模态指导的多模态层级自适应融合方法, 以文本模态信息为指导实现多模态信息的层级自适应筛选及融合。首先, 基于跨模态注意力机制实现两两模态之间的重要性信息表征; 然后通过多模态自适应门控机制实现基于多模态重要信息的层级自适应融合; 最后综合多模态特征和模态重要性信息实现多模态情感分析。在公共数据集MOSI和MOSEI上的实验结果表明: 对比基线模型, 本文所提方法在准确率与F1值方面分别提升了0.76%和0.7%。

关键词: 多模态情感分析, 多模态融合, 注意力机制, 门控网络

Abstract:

The paper proposes a multi-modal hierarchical fusion method based on text modal guidance, which uses text modal information as the guidance to achieve hierarchical adaptive screening and fusion of multi-modal information. Firstly, the importance information representation between two modalities is realized based on the cross-modal attention mechanism, then the hierarchical adaptive fusion based on the multimodal important information is realized through the multimodal adaptive gating mechanism, and finally the multimodal features are synthesized. and modal importance information to implement multimodal sentiment analysis. The experimental results on the public datasets MOSI and MOSEI show that the accuracy and F1 value of the baseline model have increased by 0.76% and 0.7%, respectively.

Key words: multimodal sentiment analysis, multimodal fusion, attention mechanism, gating network

中图分类号: 

  • TP391

图1

多模态情感分析示例图"

图2

语音模态示例图"

图3

基于文本模态指导的多模态层级自适应融合模型结构图"

图4

局部跨模态交互模块结构"

图5

全局多模态特征交互模块结构"

表1

数据集划分"

数据集 训练集数 验证集数 测试集数 合计
MOSI 1 284 229 686 2 199
MOSEI 16 326 1 871 4 659 22 856

图6

CMU-MOSI数据集情感分布"

图7

CMU-MOSEI数据集情感分布"

表2

实验参数设置"

参数 MOSI MOSEI
batch_size 32 32
learning rate 1×10-3 1×10-4
learning rate (BERT) 5×10-5 5×10-5
dropout 0.2 0.3
优化器 Adam Adam
激活函数 ReLU ReLU
dm(低维空间维度) 128 128

表3

不同模型在CMU-MOSI数据集上的实验结果"

模型 MAE Corr Acc_2 F1-score
TFN 0.901 0.698 80.81 80.74
LMF 0.917 0.695 82.52 82.42
Mult 0.861 0.711 84.10 83.90
MAG-BERT 0.773 0.770 84.88 84.83
MISA 0.797 0.755 83.55 84.08
ICCN 0.862 0.714 83.07 83.02
Self-MM 0.720 0.799 85.67 85.68
本文 0.710 0.801 86.43 86.38

表4

不同模型在CMU-MOSEI数据集上的实验结果"

模型 MAE Corr Acc_2 F1-score
TFN 0.539 0.700 82.50 82.11
LMF 0.623 0.677 82.01 82.13
Mult 0.580 0.703 82.54 82.33
MAG-BERT 0.605 0.755 84.78 84.71
MISA 0.539 0.753 84.85 84.83
ICCN 0.565 0.713 84.18 84.15
Self-MM 0.522 0.770 85.28 85.04
本文 0.531 0.762 85.36 85.29

图8

CMU-MOSI数据集模态消融实验结果"

图9

CMU-MOSI数据集模态重要性消融实验结果"

表5

CMU-MOSI数据集模型消融实验结果"

模型 MAE Corr Acc_2 F1-score
本文 0.710 0.801 86.43 86.38
(-)跨模态注意力 0.711 0.792 85.67 85.61
(-)门控单元 0.707 0.791 85.06 85.09
(-)文本门 0.708 0.800 85.52 84.48
(-)语音门 0.730 0.787 85.21 85.19
(-)视觉门 0.730 0.787 85.21 85.19
相关特征融合 0.729 0.793 86.13 86.03
特有特征融合 0.730 0.799 85.06 85.03

表6

MOSI数据集案例分析结果"

例号 多模态信息 视频画面 真实情感,真实值 预测情感,预测值
1 文本:This movie frustrated me(这部电影让我感到沮丧)语音:语调高视觉:皱眉 消极, -2.8 消极, -2.754
2 文本:And it is a really funny(这真的很有趣)语音:语气强烈视觉:微笑 积极, 1.8 积极, 1.785
2 文本: I think that the movie did rely on you to kind of figure it out (我认为这部电影确实要依靠你自己来理解它)语音:语速平缓,语气平静视觉:面无表情 中性, 0 中性, 0.084
1 JIMING L I U , PEIXIANG Z , YING L I U , et al.Summary of multi-modal sentiment analysis technology[J].Journal of Frontiers of Computer Science & Technology,2021,15(7):1165.
2 SUN Z, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI Press, 2020: 8992-8999.
3 ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen: Association for Computational Linguistics, 2017: 1103-1114.
4 ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018: 5634-5641.
5 TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019: 6558-6569.
6 YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 10790-10797.
7 WILLIAMS J, KLEINEGESSE S, COMANESCU R, et al. Recognizing emotions in video using multimodal DNN feature fusion[C]//Proceedings of Grand Challenge and Workshop on Human Multimodal Language. Melbourne: Association for Computational Linguistics, 2018: 11-19.
8 LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: Association for Computational Linguistics, 2018: 2247-2256.
9 MAI S, HU H, XING S. Modality to modality translation: an adversarial representation learning and graph fusion network for multimodal fusion[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 164-172.
10 ZHOU S, JIA J, YIN Y, et al. Understanding the teaching styles by an attention based multi-task cross-media dimensional modeling[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2019: 1322-1330.
11 CHEN M, LI X. Swafn: sentimental words aware fusion network for multimodal sentiment analysis[C]//Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: International Committee on Computational Linguistics, 2020: 1067-1077.
12 HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: Association for Computational Linguistics, 2021: 9180-9192.
13 PHAM H, LIANG P P, MANZINI T, et al. Found in translation: learning robust joint representations by cyclic translations between modalities[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2019: 6892-6899.
14 VASWANI A , SHAZEER N , PARMAR N , et al.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,30,5998-6008.
15 HAZARIKA D, ZIMMERMANN R, PORIA S. Misa: modality-invariant and specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2020: 1122-1131.
16 WANG Y, SHEN Y, LIU Z, et al. Words can shift: dynamically adjusting word representations using nonverbal behaviors[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2019: 7216-7223.
17 RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 2359-2369.
18 ANDREW G, ARORA R, BILMES J, et al. Deep canonical correlation analysis[C]//International Conference on Machine Learning. Atlanta: JMLR. org, 2013: 1247-1255.
19 DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186.
20 HOCHREITER S , SCHMIDHUBER J .Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735
21 ZADEH A , ZELLERS R , PINCUS E , et al.MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J].IEEE Intelligent Systems,2016,82-88.
22 ZADEH A, PU P. Multimodal language analysis in the wild: cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers). Melbourne: Association for Computational Linguistics, 2018: 2236-2246.
[1] 那宇嘉,谢珺,杨海洋,续欣莹. 融合上下文的知识图谱补全方法[J]. 《山东大学学报(理学版)》, 2023, 58(9): 71-80.
[2] 王静红,梁丽娜,李昊康,王熙照. 基于标记注意力机制的社区发现算法[J]. 《山东大学学报(理学版)》, 2022, 57(12): 1-12.
[3] 鲍亮,陈志豪,陈文章,叶锴,廖祥文. 基于双重多路注意力匹配的观点型阅读理解[J]. 《山东大学学报(理学版)》, 2021, 56(3): 44-53.
[4] 唐光远,郭军军,余正涛,张亚飞,高盛祥. 基于BERT与法条知识驱动的法条推荐方法[J]. 《山东大学学报(理学版)》, 2021, 56(11): 24-30.
[5] 郝长盈,兰艳艳,张海楠,郭嘉丰,徐君,庞亮,程学旗. 基于拓展关键词信息的对话生成模型[J]. 《山东大学学报(理学版)》, 2019, 54(7): 68-76.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 杨军. 金属基纳米材料表征和纳米结构调控[J]. 山东大学学报(理学版), 2013, 48(1): 1 -22 .
[2] 何海伦, 陈秀兰*. 变性剂和缓冲系统对适冷蛋白酶MCP-01和中温蛋白酶BP-01构象影响的圆二色光谱分析何海伦, 陈秀兰*[J]. 山东大学学报(理学版), 2013, 48(1): 23 -29 .
[3] 赵君1,赵晶2,樊廷俊1*,袁文鹏1,3,张铮1,丛日山1. 水溶性海星皂苷的分离纯化及其抗肿瘤活性研究[J]. J4, 2013, 48(1): 30 -35 .
[4] 孙小婷1,靳岚2*. DOSY在寡糖混合物分析中的应用[J]. J4, 2013, 48(1): 43 -45 .
[5] 罗斯特,卢丽倩,崔若飞,周伟伟,李增勇*. Monte-Carlo仿真酒精特征波长光子在皮肤中的传输规律及光纤探头设计[J]. J4, 2013, 48(1): 46 -50 .
[6] 杨伦,徐正刚,王慧*,陈其美,陈伟,胡艳霞,石元,祝洪磊,曾勇庆*. RNA干扰沉默PID1基因在C2C12细胞中表达的研究[J]. J4, 2013, 48(1): 36 -42 .
[7] 冒爱琴1, 2, 杨明君2, 3, 俞海云2, 张品1, 潘仁明1*. 五氟乙烷灭火剂高温热解机理研究[J]. J4, 2013, 48(1): 51 -55 .
[8] 杨莹,江龙*,索新丽. 容度空间上保费泛函的Choquet积分表示及相关性质[J]. J4, 2013, 48(1): 78 -82 .
[9] 李永明1, 丁立旺2. PA误差下半参数回归模型估计的r-阶矩相合[J]. J4, 2013, 48(1): 83 -88 .
[10] 董伟伟. 一种具有独立子系统的决策单元DEA排序新方法[J]. J4, 2013, 48(1): 89 -92 .