基于XLNet的双通道特征融合文本分类模型

doi:10.6040/j.issn.1671-9352.0.2021.790

摘要/Abstract

摘要： 提出了基于XLNet的双通道特征融合文本分类(XLNet-CNN-BiGRU, XLCBG)模型。相对于单模型通道,XLCBG模型通过融合XLNet+CNN和XLNet+BiGRU这2个通道的特征信息,能提取更加丰富的语义特征。XLCBG模型对融合后的特征信息分别采用了Maxpooling、Avgpooling和注意力机制等处理方式,分别提取全局中特征值最大的向量、全局中的均值特征向量、注意力机制的关键特征来代替整个向量,从而使融合特征处理的方式多样化,使最优分类模型的可选择性增多。最后,将当前流行的文本分类模型与XLCBG模型进行了比较实验。实验结果表明:XLCBG-S模型在中文THUCNews数据集上分类性能优于其他模型;XLCBG-Ap模型在英文AG News数据集上分类性能优于其他模型;在英文20NewsGroups数据集上,XLCBG-Att模型在准确率、召回率指标上均优于其他模型,XLCBG-Mp模型在精准率、F1指标上均优于其他模型。

关键词: XLNet, 双通道, 文本分类, BiGRU, CNN

Abstract: A two-channel feature-fusion text classification(XLNet-CNN-BiGRU, XLCBG)model based on XLNet is proposed. Compared with the single model channel, the XLCBG model can extract richer semantic features by integrating the feature information of XLNet+CNN and XLNet+BiGRU channels to diversify the methods of feature-fusion processing and increase the selectivity of the optimal classification model. The XLCBG model adopts Maxpooling, Avgpooling, and attention mechanism to extract the vector with the largest feature value in the global, the feature vector with the mean value in the global, and the key features of attention mechanism to replace the whole vector, respectively. Finally, the current popular text classification models are compared with the XLCBG model. The experimental results show that the XLCBG-S model exhibits better classification performance than other models on the Chinese THUCNews dataset. At the same time, the XLCBG-Ap model exhibits better classification performance than other models on the English AG News data set. In the English 20NewsGroups data set, the XLCBG-Att model is superior to other models in accuracy and recall rate, and the XLCBG-Mp model is superior to other models in accuracy rate and F₁.

Key words: XLNet, dual-channel, text classification, BiGRU, CNN

中图分类号:

TP391.1

孟金旭,单鸿涛,黄润才,闫丰亭,李志伟,郑光远,刘一鸣,石昌通. 基于XLNet的双通道特征融合文本分类模型[J]. 《山东大学学报(理学版)》, 2023, 58(5): 36-45.

MENG Jinxu, SHAN Hongtao, HUANG Runcai, YAN Fengting, LI Zhiwei, ZHENG Guangyuan, LIU Yiming, SHI Changtong. Text classification model based on dual-channel feature fusion based on XLNet[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(5): 36-45.

参考文献

[1] 汪岿,刘柏嵩.文本分类研究综述[J].数据通信,2019(3):37-47. WANG Kui, LIU Baisong. A summary of text classification research[J]. Data Communications, 2019(3):37-47.
[2] MANEK A S, SHENOY P D, MOHAN M C, et al. Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier[J]. World Wide Web, 2017, 20(2):135-154.
[3] TANHA J, VAN SOMEREN M, AFSARMANESH H. Semi-supervised self-training for decision tree classifiers[J]. International Journal of Machine Learning and Cybernetics, 2017, 8(1):355-370.
[4] TANG B, KAY S, HE H. Toward optimal feature selection in naive Bayes for text categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(9):2508-2521.
[5] KIM Y. Convolutional neural networks for sentence classification[C] //Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). Doha: Association for Computational Linguistics, 2014: 1746-1751.
[6] LIU P F, QIU X P, HUANG X J. Recurrent neural network for text classification with multi-task learning[C] //Proceedings of the Twenty-fifth International Joint Conference on Artificial Intelligence(IJCAI-16). New York: AAAI Press, 2016.
[7] 陈虹,杨燕,杜圣东. 用户评论方面级情感分析研究[J]. 计算机科学与探索, 2021, 15(3):478-485. CHEN Hong, YANG Yan, DU Shengdong. Research on aspect-level sentiment analysis of user reviews[J]. Journal of Frontiers of Computer Science & Technology, 2021, 15(3):478-485.
[8] 陶亮,刘宝宁,梁玮.基于CNN-LSTM 混合模型的心律失常自动检测[J].山东大学学报(工学版),2021,51(3):30-36. TAO Liang, LIU Baoning, LIANG Wei. Automatic detection research of arrhythmia based on CNN-LSTM hybrid model[J]. Journal of Shandong University(Engineering Science), 2021, 51(3):30-36.
[9] 吴汉瑜, 严江, 黄少滨,等.用于文本分类的CNN_BiLSTM_Attention混合模型[J].计算机科学,2020,47(S2):23-27. WU Hanyu, YAN Jiang, HUANG Shaobin, et al. CNN_BiLSTM_Attention hybrid model for text classification[J]. Computer Science, 2020, 47(S2):23-27.
[10] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J/OL]. arXiv, 2013. https://arxiv.org/pdf/1301.3781.pdf.
[11] DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[J/OL]. arXiv, 2018. https://arxiv.org/abs/1810.04805.
[12] 陈德光,马金林,马自萍,等.自然语言处理预训练技术综述[J].计算机科学与探索,2015,15(8):1359-1388. CHEN Deguang, MA Jinlin, MA Ziping, et al. Review of pre-training techniques for natural language processing[J]. Journal of Frontiers of Computer Science and Technology, 2015, 15(8):1359-1388.
[13] 董彦如,刘培玉,刘文锋,等.基于双向长短期记忆网络和标签嵌入的文本分类模型[J].山东大学学报(理学版),2020,55(11):78-86. DONG Yanru, LIU Peiyu, LIU Wenfeng, et al. A text classification model based on BiLSTM and label embedding[J]. Journal of Shandong University(Natural Science), 2020, 55(11):78-86.
[14] YANG Z, DAI Z, YANG Y, et al. XLNet: generalized autoregressive pretraining for language understanding[J/OL]. arXiv, 2019. https://arxiv.org/abs/1906.08237.
[15] LAI S, XU L, LIU K, et al. Recurrent convolutional neural networks for text classification[C] //Twenty-ninth AAAI Conference on Artificial Intelligence. Austin: AAAI Press, 2015: 2267-2273.
[16] JOHNSON R, ZHANG T. Deep pyramid convolutional neural networks for text categorization[C] //Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouve: Association for Computational Linguistics, 2017: 562-570.
[17] 郑诚,陈杰,董春阳.结合图卷积的深层神经网络用于文本分类[J].计算机工程与应用,2022,58(7):206-212. ZHENG Cheng, CHEN Jie, DONG Chunyang. Deep neural network combined with graph convolution for text classification[J].Computer Engineering and Applications, 2022, 58(7):206-212.
[18] 闫跃,霍其润,李天昊,等.融合多重注意力机制的卷积神经网络文本分类设计与实现[J].小型微型计算机系统,2021,42(2):362-367. YAN Yue, HUO Qirun, LI Tianhao, et al. Design and implementation of text classification based on convolutional neural network with multiple attention mechanisms[J]. Journal of Chinese Computer Systems, 2021, 42(2):362-367.
[19] 李启行,廖薇,孟静雯.基于注意力机制的双通道DAC-RNN文本分类模型[J].计算机工程与应用,2022,58(16):157-163. LI Qihang, LIAO Wei, MENG Jingwen. Dual-channel DAC-RNN text categorization model based on attention mechanism[J]. Computer Engineering and Applications, 2022, 58(16):157-163.

相关文章 15

[1]	郑承宇,王新,王婷,邓亚萍,尹甜甜. 基于ALBERT-TextCNN模型的多标签医疗文本分类方法[J]. 《山东大学学报(理学版)》, 2022, 57(4): 21-29.
[2]	张斌艳,朱小飞,肖朝晖,黄贤英,吴洁. 基于半监督图神经网络的短文本分类[J]. 《山东大学学报(理学版)》, 2021, 56(5): 57-65.
[3]	董彦如,刘培玉,刘文锋,赵红艳. 基于双向长短期记忆网络和标签嵌入的文本分类模型[J]. 《山东大学学报(理学版)》, 2020, 55(11): 78-86.
[4]	谢小杰,梁英,董祥祥. 社交网络用户敏感属性迭代识别方法[J]. 《山东大学学报(理学版)》, 2019, 54(3): 10-17, 27.
[5]	严倩,王礼敏,李寿山,周国栋. 结合新闻和评论文本的读者情绪分类方法[J]. 山东大学学报（理学版）, 2018, 53(9): 35-39.
[6]	孙建东,顾秀森,李彦,徐蔚然. 基于COAE2016数据集的中文实体关系抽取算法研究[J]. 山东大学学报（理学版）, 2017, 52(9): 7-12.
[7]	杨艳,徐冰,杨沐昀,赵晶晶. 一种基于联合深度学习模型的情感分类方法[J]. 山东大学学报（理学版）, 2017, 52(9): 19-25.
[8]	万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报（理学版）, 2016, 51(5): 87-93.
[9]	马成龙, 姜亚松, 李艳玲, 张艳, 颜永红. 基于词矢量相似度的短文本分类[J]. 山东大学学报（理学版）, 2014, 49(12): 18-22.
[10]	郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报（理学版）, 2014, 49(11): 74-81.
[11]	刘伍颖,易绵竹,张兴. 一种时空高效的多类别文本分类算法[J]. J4, 2013, 48(11): 99-104.
[12]	蒋盛益1,庞观松2,张建军3. 基于聚类的垃圾邮件识别技术研究[J]. J4, 2011, 46(5): 71-76.
[13]	黄贤立，罗冬梅. 倾向性文本迁移学习中的特征重要性研究[J]. J4, 2010, 45(7): 13-17.
[14]	袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 .
[15]	张华伟,王明文,甘丽新 . 基于随机森林的文本分类模型研究[J]. J4, 2006, 41(3): 139-143 .

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed