基于文本指导的层级自适应融合的多模态情感分析

doi:10.6040/j.issn.1671-9352.1.2022.421

Abstract

Abstract:

The paper proposes a multi-modal hierarchical fusion method based on text modal guidance, which uses text modal information as the guidance to achieve hierarchical adaptive screening and fusion of multi-modal information. Firstly, the importance information representation between two modalities is realized based on the cross-modal attention mechanism, then the hierarchical adaptive fusion based on the multimodal important information is realized through the multimodal adaptive gating mechanism, and finally the multimodal features are synthesized. and modal importance information to implement multimodal sentiment analysis. The experimental results on the public datasets MOSI and MOSEI show that the accuracy and F₁ value of the baseline model have increased by 0.76% and 0.7%, respectively.

Key words: multimodal sentiment analysis, multimodal fusion, attention mechanism, gating network

CLC Number:

TP391

Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion[J].JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.

Figures/Tables 15

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Table 1

Fig.6

Fig.7

Table 2

Table 3

Table 4

Fig.8

Fig.9

Table 5

Table 6

References 22

1	JIMING L I U , PEIXIANG Z , YING L I U , et al.Summary of multi-modal sentiment analysis technology[J].Journal of Frontiers of Computer Science & Technology,2021,15(7):1165.
2	SUN Z, SARMA P, SETHARES W, et al. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI Press, 2020: 8992-8999.
3	ZADEH A, CHEN M, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen: Association for Computational Linguistics, 2017: 1103-1114.
4	ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018: 5634-5641.
5	TSAI Y H H, BAI S, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019: 6558-6569.
6	YU W, XU H, YUAN Z, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 10790-10797.
7	WILLIAMS J, KLEINEGESSE S, COMANESCU R, et al. Recognizing emotions in video using multimodal DNN feature fusion[C]//Proceedings of Grand Challenge and Workshop on Human Multimodal Language. Melbourne: Association for Computational Linguistics, 2018: 11-19.
8	LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: Association for Computational Linguistics, 2018: 2247-2256.
9	MAI S, HU H, XING S. Modality to modality translation: an adversarial representation learning and graph fusion network for multimodal fusion[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 164-172.
10	ZHOU S, JIA J, YIN Y, et al. Understanding the teaching styles by an attention based multi-task cross-media dimensional modeling[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2019: 1322-1330.
11	CHEN M, LI X. Swafn: sentimental words aware fusion network for multimodal sentiment analysis[C]//Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: International Committee on Computational Linguistics, 2020: 1067-1077.
12	HAN W, CHEN H, PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: Association for Computational Linguistics, 2021: 9180-9192.
13	PHAM H, LIANG P P, MANZINI T, et al. Found in translation: learning robust joint representations by cyclic translations between modalities[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2019: 6892-6899.
14	VASWANI A , SHAZEER N , PARMAR N , et al.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,30,5998-6008.
15	HAZARIKA D, ZIMMERMANN R, PORIA S. Misa: modality-invariant and specific representations for multimodal sentiment analysis[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2020: 1122-1131.
16	WANG Y, SHEN Y, LIU Z, et al. Words can shift: dynamically adjusting word representations using nonverbal behaviors[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2019: 7216-7223.
17	RAHMAN W, HASAN M K, LEE S, et al. Integrating multimodal information in large pretrained transformers[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 2359-2369.
18	ANDREW G, ARORA R, BILMES J, et al. Deep canonical correlation analysis[C]//International Conference on Machine Learning. Atlanta: JMLR. org, 2013: 1247-1255.
19	DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis: Association for Computational Linguistics, 2019: 4171-4186.
20	HOCHREITER S , SCHMIDHUBER J .Long short-term memory[J].Neural Computation,1997,9(8):1735-1780. doi: 10.1162/neco.1997.9.8.1735
21	ZADEH A , ZELLERS R , PINCUS E , et al.MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J].IEEE Intelligent Systems,2016,82-88.
22	ZADEH A, PU P. Multimodal language analysis in the wild: cmu-mosei dataset and interpretable dynamic fusion graph[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers). Melbourne: Association for Computational Linguistics, 2018: 2236-2246.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 10

[1]	YANG Jun. Characterization and structural control of metalbased nanomaterials[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 1 -22 .
[2]	HE Hai-lun， CHEN Xiu-lan* . Circular dichroism detection of the effects of denaturants and buffers on the conformation of cold-adapted protease MCP-01 and mesophilic protease BP01[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2013, 48(1): 23 -29 .
[3]	ZHAO Jun1, ZHAO Jing2, FAN Ting-jun1*, YUAN Wen-peng1,3, ZHANG Zheng1, CONG Ri-shan1. Purification and anti-tumor activity examination of water-soluble asterosaponin from Asterias rollestoni Bell[J]. J4, 2013, 48(1): 30 -35 .
[4]	SUN Xiao-ting1, JIN Lan2*. Application of DOSY in oligosaccharide mixture analysis[J]. J4, 2013, 48(1): 43 -45 .
[5]	LUO Si-te, LU Li-qian, CUI Ruo-fei, ZHOU Wei-wei, LI Zeng-yong*. Monte-Carlo simulation of photons transmission at alcohol wavelength in skin tissue and design of fiber optic probe[J]. J4, 2013, 48(1): 46 -50 .
[6]	YANG Lun, XU Zheng-gang, WANG Hui, CHEN Qi-mei, CHEN Wei, HU Yan-xia, SHI Yuan, ZHU Hong-lei, ZENG Yong-qing. Silence of PID1 gene expression using RNA interference in C2C12 cell line[J]. J4, 2013, 48(1): 36 -42 .
[7]	MAO Ai-qin1,2, YANG Ming-jun2, 3, YU Hai-yun2, ZHANG Pin1, PAN Ren-ming1*. Study on thermal decomposition mechanism of pentafluoroethane fire extinguishing agent[J]. J4, 2013, 48(1): 51 -55 .
[8]	YANG Ying, JIANG Long*, SUO Xin-li. Choquet integral representation of premium functional and related properties on capacity space[J]. J4, 2013, 48(1): 78 -82 .
[9]	LI Yong-ming1, DING Li-wang2. The r-th moment consistency of estimators for a semi-parametric regression model for positively associated errors[J]. J4, 2013, 48(1): 83 -88 .
[10]	DONG Wei-wei. A new method of DEA efficiency ranking for decision making units with independent subsystems[J]. J4, 2013, 48(1): 89 -92 .

数据集	训练集数	验证集数	测试集数	合计
MOSI	1 284	229	686	2 199
MOSEI	16 326	1 871	4 659	22 856

参数	MOSI	MOSEI
batch_size	32	32
learning rate	1×10^-3	1×10^-4
learning rate (BERT)	5×10^-5	5×10^-5
dropout	0.2	0.3
优化器	Adam	Adam
激活函数	ReLU	ReLU
d_m(低维空间维度)	128	128

模型	MAE	Corr	Acc_2	F₁-score
TFN	0.901	0.698	80.81	80.74
LMF	0.917	0.695	82.52	82.42
Mult	0.861	0.711	84.10	83.90
MAG-BERT	0.773	0.770	84.88	84.83
MISA	0.797	0.755	83.55	84.08
ICCN	0.862	0.714	83.07	83.02
Self-MM	0.720	0.799	85.67	85.68
本文	0.710	0.801	86.43	86.38

模型	MAE	Corr	Acc_2	F₁-score
TFN	0.539	0.700	82.50	82.11
LMF	0.623	0.677	82.01	82.13
Mult	0.580	0.703	82.54	82.33
MAG-BERT	0.605	0.755	84.78	84.71
MISA	0.539	0.753	84.85	84.83
ICCN	0.565	0.713	84.18	84.15
Self-MM	0.522	0.770	85.28	85.04
本文	0.531	0.762	85.36	85.29

模型	MAE	Corr	Acc_2	F₁-score
本文	0.710	0.801	86.43	86.38
(-)跨模态注意力	0.711	0.792	85.67	85.61
(-)门控单元	0.707	0.791	85.06	85.09
(-)文本门	0.708	0.800	85.52	84.48
(-)语音门	0.730	0.787	85.21	85.19
(-)视觉门	0.730	0.787	85.21	85.19
相关特征融合	0.729	0.793	86.13	86.03
特有特征融合	0.730	0.799	85.06	85.03

Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion

RichHTML

PDF (PC)

Abstract

Cite this article

share this article

Figures/Tables 15

References 22

Related Articles 5

Metrics

Comments

Recommended 10

例号	多模态信息	视频画面	真实情感，真实值	预测情感，预测值
1	文本：This movie frustrated me(这部电影让我感到沮丧)语音：语调高视觉：皱眉		消极, -2.8	消极, -2.754
2	文本：And it is a really funny(这真的很有趣)语音：语气强烈视觉：微笑		积极, 1.8	积极, 1.785
2	文本: I think that the movie did rely on you to kind of figure it out (我认为这部电影确实要依靠你自己来理解它)语音：语速平缓，语气平静视觉：面无表情		中性, 0	中性, 0.084

[1]	Yujia NA,Jun XIE,Haiyang YANG,Xinying XU. Context fusion-based knowledge graph completion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(9): 71-80.
[2]	WANG Jing-hong, LIANG Li-na, LI Hao-kang, WANG Xi-zhao. Community discovery algorithm based on label attention mechanism [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(12): 1-12.
[3]	BAO Liang, CHEN Zhi-hao, CHEN Wen-zhang, YE Kai, LIAO Xiang-wen. Dual co-matching network with multiway attention for opinion reading comprehension [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(3): 44-53.
[4]	TANG Guang-yuan, GUO Jun-jun, YU Zheng-tao, ZHANG Ya-fei,GAO Sheng-xiang. Method of recommendation based on knowledge driven by BERT and law [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2021, 56(11): 24-30.
[5]	Chang-ying HAO,Yan-yan LAN,Hai-nan ZHANG,Jia-feng GUO,Jun XU,Liang PANG,Xue-qi CHENG. Dialogue generation model based on extended keywords information [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 68-76.