基于双向长短期记忆网络和标签嵌入的文本分类模型

doi:10.6040/j.issn.1671-9352.1.2019.024

摘要/Abstract

摘要： 提出了一种基于双向长短期记忆网络和标签嵌入的文本分类模型。首先利用BERT模型提取句子特征,然后通过BiLSTM和注意力机制得到融合重要上、下文信息的文本表示,最后将标签和词在联合空间学习,利用标签与词之间的兼容性得分对标签和句子表示加权,实现标签信息的双重嵌入,分类器根据给定标签信息对句子进行分类。在5个权威数据集上的实验表明,该方法能有效地提高文本分类性能,具有更好的实用性。

关键词: 文本分类, 文本表示, 标签嵌入

Abstract: A text classification model based on BiLSTM and label embedding for text classification was proposed. Firstly, our method introduced the BERT model to extract high-quality sentence features. Then, we used BiLSTM and attention mechanism to get the text representations that integrate important context information. Finally, labels and words learned in the joint space, and we used the compatibility score deriving from the label-word pairs to weight the labels and sentences representations, realizing the double label embeddings. The classifier classifies sentences according to the given label information. Experimental results on five general authoritative datasets show that our method effectively improves the text classification performance, and our model has better practicability.

Key words: text classification, text representation, labels embedding

中图分类号:

TP311.5

董彦如,刘培玉,刘文锋,赵红艳. 基于双向长短期记忆网络和标签嵌入的文本分类模型[J]. 《山东大学学报(理学版)》, 2020, 55(11): 78-86.

DONG Yan-ru, LIU Pei-yu, LIU Wen-feng, ZHAO Hong-yan. A text classification model based on BiLSTM and label embedding[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2020, 55(11): 78-86.

参考文献

[1] ZHANG Y H, SHEN D H, WANG G Y, et al. Deconvolutional paragraph representation learning[C] //Advances in Neural Information Processing Systems. California: Springer, 2017: 4169-4179.
[2] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[3] WANG W L, GAN Z, WANG W Q, et al. Topic compositional neural language model[C] //International Conference on Artificial Intelligence and Statistics. Lanzarote. Spain: PMLR, 2018: 356-365.
[4] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6):602-610.
[5] NOWAK J, TASPINAR A, SCHERER R. LSTM recurrent neural networks for short text and sentiment classification[C] //International Conference on Artificial Intelligence and Soft Computing. Dubai: Springer, 2017: 553-562.
[6] NIU X L, HOU Y X, WANG P P. Bi-directional LSTM with quantum attention mechanism for sentence modeling[C] //International Conference on Neural Information Processing. Guangzhou: Springer, 2017: 178-188.
[7] BAHDANAU D, CHO K, BENGIO Y, et al. Neural machine translation by jointly learning to align and translate[C] //International Conference on Learning Representations. San Diego: Springer, 2015.
[8] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. California: Springer, 2016: 1480-1489.
[9] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C] //Conference of the European Chapter of the Association for Computational Linguistics. Valencia: Springer, 2017: 427-431.
[10] SHEN D, WANG G, WANG W, et al. Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms[C] //Meeting of the Association for Computational Linguistics. Melbourne: Springer, 2018: 440-450.
[11] REZAEINIA S M, RAHMANI R, GHODSI A, et al. Sentiment analysis based on improved pre-trained word embeddings[J]. Expert Systems with Applications, 2019, 117:139-147.
[12] DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //North American Chapter of the Association for Computational Linguistics. Minneapolis: Springer, 2019: 4171-4186.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Advances in Neural Information Processing Systems. California: Springer, 2017: 5998-6008.
[14] MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space[C] //International Conference on Learning Representations. Scottsdale: Springer, 2013.
[15] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C] //North American Chapter of the Association for Computational Linguistics. New Orleans: Springer, 2018: 2227-2237.
[16] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[J]. Computation and Language, 2017, 4(6):212-220.
[17] LUO Y. Recurrent neural networks for classifying relations in clinical notes[J]. Journal of Biomedical Informatics, 2017, 72:85-95.
[18] WU D, CHI M G. Long short-term memory with quadratic connections in recursive neural networks for representing compositional semantics[J]. IEEE Access, 2017, 5:16077-16083.
[19] WANG Y, FENG S, WANG D L, et al. Context-aware chinese microblog sentiment classification with bidirectional LSTM[C] //Asia-Pacific Web Conference. Suzhou: Springer, 2016: 594-606.
[20] YANG M, TU W, WANG J, et al. Attention-based LSTM for target-dependent sentiment classification[C] //Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2017: 5013-5014.
[21] DANILUK M, ROCKTÄSCHEL T, WELBL J, et al. Frustratingly short attention spans in neural language modeling[J]. Computation and Language, 2017, 14(7):812-820.
[22] PARIKH A, TÄCKSTRÖM O, DAS D, et al. A decomposable attention model for natural language inference[C] //Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin:Association for Computational Linguistics, 2016: 2249-2255.
[23] AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Label-embedding for image classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(7):1425-1438.
[24] RODRIGUEZ-SERRANO J A, PERRONNIN F, MEYLAN F. Label embedding for text recognition[C] //BMVC. United Kingdom: Springer, 2013: 5.1-5.12.
[25] TANG J, QU M, MEI Q. PTE: predictive text embedding through large-scale heterogeneous text networks[C] //Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2015: 1165-1174.
[26] ZHANG H, XIAO L, CHEN W, et al. Multi-task label embedding for text classification[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018: 4545-4553.
[27] CHUNG J, GULCEHRE C, CHO K, et al. Gated feedback recurrent neural networks[J]. Computer Science, 2015, 37(3):2067-2075.
[28] KIROS R, ZHU Y, SALAKHUTDINOV R R, et al. Skip-thought vectors[C] //Advances in Neural Information Processing Systems. Montreal: Springer, 2015: 3294-3302.
[29] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C] //International Conference on Machine Learning. Montreal: JMLR, 2014: 1188-1196.
[30] ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C] //Advances in Neural Information Processing Systems. Montreal: Springer, 2015: 649-657.
[31] CONNEAU A, SCHWENK H, BARRAULT L, et al. Very deep convolutional networks for text classification[C] //Conference of the European Chapter of the Association for Computational Linguistics. Vancouver: ACL, 2017: 1107-1116.
[32] KINGMA D P, BA J. ADAM: a method for stochastic optimization[J].Neural Networks, 2014, 15(4):95-103.
[33] HILL F, CHO K, KORHONEN A. Learning distributed representations of sentences from unlabelled data[C] // Proceedings of NAACL-HLT. San Diego: NAACL, 2016: 1367-1377.
[34] AGIRRE E, BANEA C, CARDIE C, et al. Semeval-2014 task 10: multilingual semantic textual similarity[C] //Proceedings of the 8th International Workshop on Semantic Evaluation(SemEval 2014). Dublin: ACL, 2014: 81-91.
[35] JOHNSON A E W, POLLARD T J, SHEN L, et al. MIMIC-III, a freely accessible critical care database[J]. Scientific Data, 2016, 3: 160035.
[36] KIM Y. Convolutional neural networks for sentence classification[M] //Empirical Methods in Natural Language Processing. Doha: EMNLP, 2014: 1746-1751.
[37] SHI H R, XIE P T, HU Z T, et al. Towards automated ICD coding using deep learning[J]. Computation and Language, 2017, 23(8):1409-1418.
[38] MULLENBACH J, WIEGREFFE S, DUKE J, et al. Explainable prediction of medical codes from clinical text[C] //Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans: ACL, 2018: 1101-1111.
[39] SHEN T, ZHOU T, LONG G, et al. Bi-directional block self-attention for fast and memory-efficient sequence modeling[C] //International Conference on Learning Representations. Vancouver: Springer, 2018: 779-788.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed