《山东大学学报(理学版)》 ›› 2020, Vol. 55 ›› Issue (11): 78-86.doi: 10.6040/j.issn.1671-9352.1.2019.024
• • 上一篇
董彦如1,刘培玉1,刘文锋1,2,赵红艳3
DONG Yan-ru1, LIU Pei-yu1, LIU Wen-feng1,2, ZHAO Hong-yan3
摘要: 提出了一种基于双向长短期记忆网络和标签嵌入的文本分类模型。首先利用BERT模型提取句子特征,然后通过BiLSTM和注意力机制得到融合重要上、下文信息的文本表示,最后将标签和词在联合空间学习,利用标签与词之间的兼容性得分对标签和句子表示加权,实现标签信息的双重嵌入,分类器根据给定标签信息对句子进行分类。在5个权威数据集上的实验表明,该方法能有效地提高文本分类性能,具有更好的实用性。
中图分类号:
[1] ZHANG Y H, SHEN D H, WANG G Y, et al. Deconvolutional paragraph representation learning[C] //Advances in Neural Information Processing Systems. California: Springer, 2017: 4169-4179. [2] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. [3] WANG W L, GAN Z, WANG W Q, et al. Topic compositional neural language model[C] //International Conference on Artificial Intelligence and Statistics. Lanzarote. Spain: PMLR, 2018: 356-365. [4] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6):602-610. [5] NOWAK J, TASPINAR A, SCHERER R. LSTM recurrent neural networks for short text and sentiment classification[C] //International Conference on Artificial Intelligence and Soft Computing. Dubai: Springer, 2017: 553-562. [6] NIU X L, HOU Y X, WANG P P. Bi-directional LSTM with quantum attention mechanism for sentence modeling[C] //International Conference on Neural Information Processing. Guangzhou: Springer, 2017: 178-188. [7] BAHDANAU D, CHO K, BENGIO Y, et al. Neural machine translation by jointly learning to align and translate[C] //International Conference on Learning Representations. San Diego: Springer, 2015. [8] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. California: Springer, 2016: 1480-1489. [9] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C] //Conference of the European Chapter of the Association for Computational Linguistics. Valencia: Springer, 2017: 427-431. [10] SHEN D, WANG G, WANG W, et al. Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms[C] //Meeting of the Association for Computational Linguistics. Melbourne: Springer, 2018: 440-450. [11] REZAEINIA S M, RAHMANI R, GHODSI A, et al. Sentiment analysis based on improved pre-trained word embeddings[J]. Expert Systems with Applications, 2019, 117:139-147. [12] DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //North American Chapter of the Association for Computational Linguistics. Minneapolis: Springer, 2019: 4171-4186. [13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Advances in Neural Information Processing Systems. California: Springer, 2017: 5998-6008. [14] MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space[C] //International Conference on Learning Representations. Scottsdale: Springer, 2013. [15] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C] //North American Chapter of the Association for Computational Linguistics. New Orleans: Springer, 2018: 2227-2237. [16] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[J]. Computation and Language, 2017, 4(6):212-220. [17] LUO Y. Recurrent neural networks for classifying relations in clinical notes[J]. Journal of Biomedical Informatics, 2017, 72:85-95. [18] WU D, CHI M G. Long short-term memory with quadratic connections in recursive neural networks for representing compositional semantics[J]. IEEE Access, 2017, 5:16077-16083. [19] WANG Y, FENG S, WANG D L, et al. Context-aware chinese microblog sentiment classification with bidirectional LSTM[C] //Asia-Pacific Web Conference. Suzhou: Springer, 2016: 594-606. [20] YANG M, TU W, WANG J, et al. Attention-based LSTM for target-dependent sentiment classification[C] //Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2017: 5013-5014. [21] DANILUK M, ROCKTÄSCHEL T, WELBL J, et al. Frustratingly short attention spans in neural language modeling[J]. Computation and Language, 2017, 14(7):812-820. [22] PARIKH A, TÄCKSTRÖM O, DAS D, et al. A decomposable attention model for natural language inference[C] //Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin:Association for Computational Linguistics, 2016: 2249-2255. [23] AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Label-embedding for image classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(7):1425-1438. [24] RODRIGUEZ-SERRANO J A, PERRONNIN F, MEYLAN F. Label embedding for text recognition[C] //BMVC. United Kingdom: Springer, 2013: 5.1-5.12. [25] TANG J, QU M, MEI Q. PTE: predictive text embedding through large-scale heterogeneous text networks[C] //Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2015: 1165-1174. [26] ZHANG H, XIAO L, CHEN W, et al. Multi-task label embedding for text classification[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018: 4545-4553. [27] CHUNG J, GULCEHRE C, CHO K, et al. Gated feedback recurrent neural networks[J]. Computer Science, 2015, 37(3):2067-2075. [28] KIROS R, ZHU Y, SALAKHUTDINOV R R, et al. Skip-thought vectors[C] //Advances in Neural Information Processing Systems. Montreal: Springer, 2015: 3294-3302. [29] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C] //International Conference on Machine Learning. Montreal: JMLR, 2014: 1188-1196. [30] ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C] //Advances in Neural Information Processing Systems. Montreal: Springer, 2015: 649-657. [31] CONNEAU A, SCHWENK H, BARRAULT L, et al. Very deep convolutional networks for text classification[C] //Conference of the European Chapter of the Association for Computational Linguistics. Vancouver: ACL, 2017: 1107-1116. [32] KINGMA D P, BA J. ADAM: a method for stochastic optimization[J].Neural Networks, 2014, 15(4):95-103. [33] HILL F, CHO K, KORHONEN A. Learning distributed representations of sentences from unlabelled data[C] // Proceedings of NAACL-HLT. San Diego: NAACL, 2016: 1367-1377. [34] AGIRRE E, BANEA C, CARDIE C, et al. Semeval-2014 task 10: multilingual semantic textual similarity[C] //Proceedings of the 8th International Workshop on Semantic Evaluation(SemEval 2014). Dublin: ACL, 2014: 81-91. [35] JOHNSON A E W, POLLARD T J, SHEN L, et al. MIMIC-III, a freely accessible critical care database[J]. Scientific Data, 2016, 3: 160035. [36] KIM Y. Convolutional neural networks for sentence classification[M] //Empirical Methods in Natural Language Processing. Doha: EMNLP, 2014: 1746-1751. [37] SHI H R, XIE P T, HU Z T, et al. Towards automated ICD coding using deep learning[J]. Computation and Language, 2017, 23(8):1409-1418. [38] MULLENBACH J, WIEGREFFE S, DUKE J, et al. Explainable prediction of medical codes from clinical text[C] //Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans: ACL, 2018: 1101-1111. [39] SHEN T, ZHOU T, LONG G, et al. Bi-directional block self-attention for fast and memory-efficient sequence modeling[C] //International Conference on Learning Representations. Vancouver: Springer, 2018: 779-788. |
[1] | 谢小杰,梁英,董祥祥. 社交网络用户敏感属性迭代识别方法[J]. 《山东大学学报(理学版)》, 2019, 54(3): 10-17, 27. |
[2] | 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93. |
[3] | 马成龙, 姜亚松, 李艳玲, 张艳, 颜永红. 基于词矢量相似度的短文本分类[J]. 山东大学学报(理学版), 2014, 49(12): 18-22. |
[4] | 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81. |
[5] | 刘伍颖,易绵竹,张兴. 一种时空高效的多类别文本分类算法[J]. J4, 2013, 48(11): 99-104. |
[6] | 蒋盛益1,庞观松2,张建军3. 基于聚类的垃圾邮件识别技术研究[J]. J4, 2011, 46(5): 71-76. |
[7] | 黄贤立,罗冬梅. 倾向性文本迁移学习中的特征重要性研究[J]. J4, 2010, 45(7): 13-17. |
[8] | 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 . |
[9] | 张华伟,王明文,甘丽新 . 基于随机森林的文本分类模型研究[J]. J4, 2006, 41(3): 139-143 . |
[10] | 张国英,沙 芸,江慧娜 . 基于粒子群优化的快速KNN分类算法[J]. J4, 2006, 41(3): 34-36 . |
[11] | 白如江,王效岳 . 基于粗糙集理论和BP神经网络的文本自动分类方法研究[J]. J4, 2006, 41(3): 70-75 . |
[12] | 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 . |
[13] | 万海平,何华灿,周延泉 . 局部核方法及其应用[J]. J4, 2006, 41(3): 18-20 . |
[14] | 李 森,马 军,赵 嫣,雷景生, . 对数字化科技论文的自动分类研究[J]. J4, 2006, 41(3): 81-84 . |
[15] | 袁 方,苑俊英 . 基于类别核心词的朴素贝叶斯中文文本分类[J]. J4, 2006, 41(3): 46-49 . |
|