JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2020, Vol. 55 ›› Issue (11): 78-86.doi: 10.6040/j.issn.1671-9352.1.2019.024

Previous Articles    

A text classification model based on BiLSTM and label embedding

DONG Yan-ru1, LIU Pei-yu1, LIU Wen-feng1,2, ZHAO Hong-yan3   

  1. 1. School of Information Science and Engineering, Shandong Normal University, Jinan 250358, Shandong, China;
    2. School of Computer Science, Heze University, Heze 274015, Shandong, China;
    3. School of Information Engineering, Shandong Yingcai University, Jinan 250101, Shandong, China
  • Published:2020-11-17

Abstract: A text classification model based on BiLSTM and label embedding for text classification was proposed. Firstly, our method introduced the BERT model to extract high-quality sentence features. Then, we used BiLSTM and attention mechanism to get the text representations that integrate important context information. Finally, labels and words learned in the joint space, and we used the compatibility score deriving from the label-word pairs to weight the labels and sentences representations, realizing the double label embeddings. The classifier classifies sentences according to the given label information. Experimental results on five general authoritative datasets show that our method effectively improves the text classification performance, and our model has better practicability.

Key words: text classification, text representation, labels embedding

CLC Number: 

  • TP311.5
[1] ZHANG Y H, SHEN D H, WANG G Y, et al. Deconvolutional paragraph representation learning[C] //Advances in Neural Information Processing Systems. California: Springer, 2017: 4169-4179.
[2] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[3] WANG W L, GAN Z, WANG W Q, et al. Topic compositional neural language model[C] //International Conference on Artificial Intelligence and Statistics. Lanzarote. Spain: PMLR, 2018: 356-365.
[4] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6):602-610.
[5] NOWAK J, TASPINAR A, SCHERER R. LSTM recurrent neural networks for short text and sentiment classification[C] //International Conference on Artificial Intelligence and Soft Computing. Dubai: Springer, 2017: 553-562.
[6] NIU X L, HOU Y X, WANG P P. Bi-directional LSTM with quantum attention mechanism for sentence modeling[C] //International Conference on Neural Information Processing. Guangzhou: Springer, 2017: 178-188.
[7] BAHDANAU D, CHO K, BENGIO Y, et al. Neural machine translation by jointly learning to align and translate[C] //International Conference on Learning Representations. San Diego: Springer, 2015.
[8] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[C] //Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. California: Springer, 2016: 1480-1489.
[9] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C] //Conference of the European Chapter of the Association for Computational Linguistics. Valencia: Springer, 2017: 427-431.
[10] SHEN D, WANG G, WANG W, et al. Baseline needs more love: on simple word-embedding-based models and associated pooling mechanisms[C] //Meeting of the Association for Computational Linguistics. Melbourne: Springer, 2018: 440-450.
[11] REZAEINIA S M, RAHMANI R, GHODSI A, et al. Sentiment analysis based on improved pre-trained word embeddings[J]. Expert Systems with Applications, 2019, 117:139-147.
[12] DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //North American Chapter of the Association for Computational Linguistics. Minneapolis: Springer, 2019: 4171-4186.
[13] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Advances in Neural Information Processing Systems. California: Springer, 2017: 5998-6008.
[14] MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space[C] //International Conference on Learning Representations. Scottsdale: Springer, 2013.
[15] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C] //North American Chapter of the Association for Computational Linguistics. New Orleans: Springer, 2018: 2227-2237.
[16] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[J]. Computation and Language, 2017, 4(6):212-220.
[17] LUO Y. Recurrent neural networks for classifying relations in clinical notes[J]. Journal of Biomedical Informatics, 2017, 72:85-95.
[18] WU D, CHI M G. Long short-term memory with quadratic connections in recursive neural networks for representing compositional semantics[J]. IEEE Access, 2017, 5:16077-16083.
[19] WANG Y, FENG S, WANG D L, et al. Context-aware chinese microblog sentiment classification with bidirectional LSTM[C] //Asia-Pacific Web Conference. Suzhou: Springer, 2016: 594-606.
[20] YANG M, TU W, WANG J, et al. Attention-based LSTM for target-dependent sentiment classification[C] //Proceedings of the Thirty-first AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2017: 5013-5014.
[21] DANILUK M, ROCKTÄSCHEL T, WELBL J, et al. Frustratingly short attention spans in neural language modeling[J]. Computation and Language, 2017, 14(7):812-820.
[22] PARIKH A, TÄCKSTRÖM O, DAS D, et al. A decomposable attention model for natural language inference[C] //Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin:Association for Computational Linguistics, 2016: 2249-2255.
[23] AKATA Z, PERRONNIN F, HARCHAOUI Z, et al. Label-embedding for image classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(7):1425-1438.
[24] RODRIGUEZ-SERRANO J A, PERRONNIN F, MEYLAN F. Label embedding for text recognition[C] //BMVC. United Kingdom: Springer, 2013: 5.1-5.12.
[25] TANG J, QU M, MEI Q. PTE: predictive text embedding through large-scale heterogeneous text networks[C] //Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2015: 1165-1174.
[26] ZHANG H, XIAO L, CHEN W, et al. Multi-task label embedding for text classification[C] //Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018: 4545-4553.
[27] CHUNG J, GULCEHRE C, CHO K, et al. Gated feedback recurrent neural networks[J]. Computer Science, 2015, 37(3):2067-2075.
[28] KIROS R, ZHU Y, SALAKHUTDINOV R R, et al. Skip-thought vectors[C] //Advances in Neural Information Processing Systems. Montreal: Springer, 2015: 3294-3302.
[29] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C] //International Conference on Machine Learning. Montreal: JMLR, 2014: 1188-1196.
[30] ZHANG X, ZHAO J, LECUN Y. Character-level convolutional networks for text classification[C] //Advances in Neural Information Processing Systems. Montreal: Springer, 2015: 649-657.
[31] CONNEAU A, SCHWENK H, BARRAULT L, et al. Very deep convolutional networks for text classification[C] //Conference of the European Chapter of the Association for Computational Linguistics. Vancouver: ACL, 2017: 1107-1116.
[32] KINGMA D P, BA J. ADAM: a method for stochastic optimization[J].Neural Networks, 2014, 15(4):95-103.
[33] HILL F, CHO K, KORHONEN A. Learning distributed representations of sentences from unlabelled data[C] // Proceedings of NAACL-HLT. San Diego: NAACL, 2016: 1367-1377.
[34] AGIRRE E, BANEA C, CARDIE C, et al. Semeval-2014 task 10: multilingual semantic textual similarity[C] //Proceedings of the 8th International Workshop on Semantic Evaluation(SemEval 2014). Dublin: ACL, 2014: 81-91.
[35] JOHNSON A E W, POLLARD T J, SHEN L, et al. MIMIC-III, a freely accessible critical care database[J]. Scientific Data, 2016, 3: 160035.
[36] KIM Y. Convolutional neural networks for sentence classification[M] //Empirical Methods in Natural Language Processing. Doha: EMNLP, 2014: 1746-1751.
[37] SHI H R, XIE P T, HU Z T, et al. Towards automated ICD coding using deep learning[J]. Computation and Language, 2017, 23(8):1409-1418.
[38] MULLENBACH J, WIEGREFFE S, DUKE J, et al. Explainable prediction of medical codes from clinical text[C] //Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans: ACL, 2018: 1101-1111.
[39] SHEN T, ZHOU T, LONG G, et al. Bi-directional block self-attention for fast and memory-efficient sequence modeling[C] //International Conference on Learning Representations. Vancouver: Springer, 2018: 779-788.
[1] Xiao-jie XIE,Ying LIANG,Xiang-xiang DONG. Sensitive attribute iterative inference method for social network users [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 10-17, 27.
[2] WAN Zhong-ying, WANG Ming-wen, ZUO Jia-li, WAN Jian-yi. Feature selection combined with the global and local information(GLFS) [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(5): 87-93.
[3] MA Cheng-long, JIANG Ya-song, LI Yan-ling, ZHANG Yan, YAN Yong-hong. Short text classification based on word embedding similarity [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 18-22.
[4] ZHENG Yan, PANG Lin, BI Hui, LIU Wei, CHENG Gong. Feature selection algorithm based on sentiment topic model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 74-81.
[5] BAI Ru-jiang,WANG Xiao-yue . AA hybrid classifier based on the rough sets and BPneural networks [J]. J4, 2006, 41(3): 70-75 .
[6] YU Jun-ying,WANG Ming-wen,SHENG Jun . Class information feature selection method for text classification [J]. J4, 2006, 41(3): 144-148 .
[7] WAN Hai-ping,HE Hua-can,ZHOU Yan-quan . Locality preserving kernel method and its application [J]. J4, 2006, 41(3): 18-20 .
[8] YUAN Fang,YUAN Jun-ying . Naive Bayes Chinese text classification based on core words of class [J]. J4, 2006, 41(3): 46-49 .
[9] ZHANG Wei-hua,WANG Ming-wen,GAN Li-xin . Automatic text classification model based on random forest [J]. J4, 2006, 41(3): 139-143 .
[10] ZHANG Guo-ying,SHA Yun,JIANG Hui-na . An improved KNN classification algorithm based on particle swarm optimization [J]. J4, 2006, 41(3): 34-36 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!