您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2020, Vol. 55 ›› Issue (11): 78-86.doi: 10.6040/j.issn.1671-9352.1.2019.024

• • 上一篇    

基于双向长短期记忆网络和标签嵌入的文本分类模型

董彦如1,刘培玉1,刘文锋1,2,赵红艳3   

  1. 1.山东师范大学信息科学与工程学院, 山东 济南 250358;2.菏泽学院计算机学院, 山东 菏泽 274015;3.山东英才学院信息工程学院, 山东 济南 250101
  • 发布日期:2020-11-17
  • 作者简介:董彦如(1995— ),女,硕士研究生,研究方向为自然语言处理. E-mail:858344533@qq.com
  • 基金资助:
    国家自然科学基金资助项目(61373148);国家自然科学基金青年资助项目(61502151);山东省社科规划项目(17CHLJ18,17CHLJ33,17CHLJ30);山东省自然科学基金资助项(ZR2014FL010);山东省教育厅基金资助项目(J15LN34)

A text classification model based on BiLSTM and label embedding

DONG Yan-ru1, LIU Pei-yu1, LIU Wen-feng1,2, ZHAO Hong-yan3   

  1. 1. School of Information Science and Engineering, Shandong Normal University, Jinan 250358, Shandong, China;
    2. School of Computer Science, Heze University, Heze 274015, Shandong, China;
    3. School of Information Engineering, Shandong Yingcai University, Jinan 250101, Shandong, China
  • Published:2020-11-17

摘要: 提出了一种基于双向长短期记忆网络和标签嵌入的文本分类模型。首先利用BERT模型提取句子特征,然后通过BiLSTM和注意力机制得到融合重要上、下文信息的文本表示,最后将标签和词在联合空间学习,利用标签与词之间的兼容性得分对标签和句子表示加权,实现标签信息的双重嵌入,分类器根据给定标签信息对句子进行分类。在5个权威数据集上的实验表明,该方法能有效地提高文本分类性能,具有更好的实用性。

关键词: 文本分类, 文本表示, 标签嵌入

Abstract: A text classification model based on BiLSTM and label embedding for text classification was proposed. Firstly, our method introduced the BERT model to extract high-quality sentence features. Then, we used BiLSTM and attention mechanism to get the text representations that integrate important context information. Finally, labels and words learned in the joint space, and we used the compatibility score deriving from the label-word pairs to weight the labels and sentences representations, realizing the double label embeddings. The classifier classifies sentences according to the given label information. Experimental results on five general authoritative datasets show that our method effectively improves the text classification performance, and our model has better practicability.

Key words: text classification, text representation, labels embedding

中图分类号: 

  • TP311.5
[1] ZHANG Y H, SHEN D H, WANG G Y, et al. Deconvolutional paragraph representation learning[C] //Advances in Neural Information Processing Systems. California: Springer, 2017: 4169-4179.
[2] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[3] WANG W L, GAN Z, WANG W Q, et al. Topic compositional neural language model[C] //International Conference on Artificial Intelligence and Statistics. Lanzarote. Spain: PMLR, 2018: 356-365.
[4] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6):602-610.
[5] NOWAK J, TASPINAR A, SCHERER R. LSTM recurrent neural networks for short text and sentiment classification[C] //International Conference on Artificial Intelligence and Soft Computing. Dubai: Springer, 2017: 553-562.
[6] NIU X L, HOU Y X, WANG P P. Bi-directional LSTM with quantum attention mechanism for sentence modeling[C] //International Conference on Neural Information Processing. Guangzhou: Springer, 2017: 178-188.
[7] BAHDANAU D, CHO K, BENGIO Y, et al. Neural machine translation by jointly learning to align and translate[C] //International Conference on Learning Representations. San Diego: Springer, 2015.
[8] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. California: Springer, 2016: 1480-1489.
[9] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C] //Conference of the European Chapter of the Association for Computational Linguistics. Valencia: Springer, 2017: 427-431.
[10] SHEN D, WANG G, WANG W, et al. Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms[C] //Meeting of the Association for Computational Linguistics. Melbourne: Springer, 2018: 440-450.
[11] REZAEINIA S M, RAHMANI R, GHODSI A, et al. Sentiment analysis based on improved pre-trained word embeddings[J]. Expert Systems with Applications, 2019, 117:139-147.
[12] DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //North American Chapter of the Association for Computational Linguistics. Minneapolis: Springer, 2019: 4171-4186.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Advances in Neural Information Processing Systems. California: Springer, 2017: 5998-6008.
[14] MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space[C] //International Conference on Learning Representations. Scottsdale: Springer, 2013.
[15] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C] //North American Chapter of the Association for Computational Linguistics. New Orleans: Springer, 2018: 2227-2237.
[16] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[J]. Computation and Language, 2017, 4(6):212-220.
[17] LUO Y. Recurrent neural networks for classifying relations in clinical notes[J]. Journal of Biomedical Informatics, 2017, 72:85-95.
[18] WU D, CHI M G. Long short-term memory with quadratic connections in recursive neural networks for representing compositional semantics[J]. IEEE Access, 2017, 5:16077-16083.
[19] WANG Y, FENG S, WANG D L, et al. Context-aware chinese microblog sentiment classification with bidirectional LSTM[C] //Asia-Pacific Web Conference. Suzhou: Springer, 2016: 594-606.
[20] YANG M, TU W, WANG J, et al. Attention-based LSTM for target-dependent sentiment classification[C] //Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2017: 5013-5014.
[21] DANILUK M, ROCKTÄSCHEL T, WELBL J, et al. Frustratingly short attention spans in neural language modeling[J]. Computation and Language, 2017, 14(7):812-820.
[22] PARIKH A, TÄCKSTRÖM O, DAS D, et al. A decomposable attention model for natural language inference[C] //Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin:Association for Computational Linguistics, 2016: 2249-2255.
[23] AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Label-embedding for image classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(7):1425-1438.
[24] RODRIGUEZ-SERRANO J A, PERRONNIN F, MEYLAN F. Label embedding for text recognition[C] //BMVC. United Kingdom: Springer, 2013: 5.1-5.12.
[25] TANG J, QU M, MEI Q. PTE: predictive text embedding through large-scale heterogeneous text networks[C] //Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2015: 1165-1174.
[26] ZHANG H, XIAO L, CHEN W, et al. Multi-task label embedding for text classification[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018: 4545-4553.
[27] CHUNG J, GULCEHRE C, CHO K, et al. Gated feedback recurrent neural networks[J]. Computer Science, 2015, 37(3):2067-2075.
[28] KIROS R, ZHU Y, SALAKHUTDINOV R R, et al. Skip-thought vectors[C] //Advances in Neural Information Processing Systems. Montreal: Springer, 2015: 3294-3302.
[29] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C] //International Conference on Machine Learning. Montreal: JMLR, 2014: 1188-1196.
[30] ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C] //Advances in Neural Information Processing Systems. Montreal: Springer, 2015: 649-657.
[31] CONNEAU A, SCHWENK H, BARRAULT L, et al. Very deep convolutional networks for text classification[C] //Conference of the European Chapter of the Association for Computational Linguistics. Vancouver: ACL, 2017: 1107-1116.
[32] KINGMA D P, BA J. ADAM: a method for stochastic optimization[J].Neural Networks, 2014, 15(4):95-103.
[33] HILL F, CHO K, KORHONEN A. Learning distributed representations of sentences from unlabelled data[C] // Proceedings of NAACL-HLT. San Diego: NAACL, 2016: 1367-1377.
[34] AGIRRE E, BANEA C, CARDIE C, et al. Semeval-2014 task 10: multilingual semantic textual similarity[C] //Proceedings of the 8th International Workshop on Semantic Evaluation(SemEval 2014). Dublin: ACL, 2014: 81-91.
[35] JOHNSON A E W, POLLARD T J, SHEN L, et al. MIMIC-III, a freely accessible critical care database[J]. Scientific Data, 2016, 3: 160035.
[36] KIM Y. Convolutional neural networks for sentence classification[M] //Empirical Methods in Natural Language Processing. Doha: EMNLP, 2014: 1746-1751.
[37] SHI H R, XIE P T, HU Z T, et al. Towards automated ICD coding using deep learning[J]. Computation and Language, 2017, 23(8):1409-1418.
[38] MULLENBACH J, WIEGREFFE S, DUKE J, et al. Explainable prediction of medical codes from clinical text[C] //Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans: ACL, 2018: 1101-1111.
[39] SHEN T, ZHOU T, LONG G, et al. Bi-directional block self-attention for fast and memory-efficient sequence modeling[C] //International Conference on Learning Representations. Vancouver: Springer, 2018: 779-788.
[1] 谢小杰,梁英,董祥祥. 社交网络用户敏感属性迭代识别方法[J]. 《山东大学学报(理学版)》, 2019, 54(3): 10-17, 27.
[2] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[3] 马成龙, 姜亚松, 李艳玲, 张艳, 颜永红. 基于词矢量相似度的短文本分类[J]. 山东大学学报(理学版), 2014, 49(12): 18-22.
[4] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[5] 刘伍颖,易绵竹,张兴. 一种时空高效的多类别文本分类算法[J]. J4, 2013, 48(11): 99-104.
[6] 蒋盛益1,庞观松2,张建军3. 基于聚类的垃圾邮件识别技术研究[J]. J4, 2011, 46(5): 71-76.
[7] 黄贤立,罗冬梅. 倾向性文本迁移学习中的特征重要性研究[J]. J4, 2010, 45(7): 13-17.
[8] 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 .
[9] 张华伟,王明文,甘丽新 . 基于随机森林的文本分类模型研究[J]. J4, 2006, 41(3): 139-143 .
[10] 张国英,沙 芸,江慧娜 . 基于粒子群优化的快速KNN分类算法[J]. J4, 2006, 41(3): 34-36 .
[11] 白如江,王效岳 . 基于粗糙集理论和BP神经网络的文本自动分类方法研究[J]. J4, 2006, 41(3): 70-75 .
[12] 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 .
[13] 万海平,何华灿,周延泉 . 局部核方法及其应用[J]. J4, 2006, 41(3): 18-20 .
[14] 李 森,马 军,赵 嫣,雷景生, . 对数字化科技论文的自动分类研究[J]. J4, 2006, 41(3): 81-84 .
[15] 袁 方,苑俊英 . 基于类别核心词的朴素贝叶斯中文文本分类[J]. J4, 2006, 41(3): 46-49 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!