JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2018, Vol. 53 ›› Issue (3): 36-45.doi: 10.6040/j.issn.1671-9352.1.2017.093

Previous Articles     Next Articles

Text feature extraction method for sentiment analysis based on order-preserving submatrix and frequent sequential pattern mining

CHEN Xin1,2, XUE Yun1,3*, LU Xin1, LI Wan-li1, ZHAO Hong-ya2, HU Xiao-hui1   

  1. 1. School of Physics and Telecommunication Engineering, South China Normal University, Guangdong 510006, Guangzhou, China;
    2. Shenzhen PolyTechnic, Shenzhen 518055, Guangdong, China;
    3. Guangdong Provincial Engineering Technology Research Center for Data Science, Guangdong 510006, Guangzhou, China
  • Received:2017-07-04 Online:2018-03-20 Published:2018-03-13

Abstract: Feature extraction is one of the key steps in text sentiment analysis, which is also the main factor that affects the result. According to the variant expression of online review, the synonyms TF-IDF(term frequency-inverse document frequency)weight vector is obtained based on the semantic similarity. Then in view of the different length of online review, the local patterns of the feature vectors are identified with OPSM(order-preserving submatrix)biclustering algorithm. We improve PrefixSpan algorithm to detect the frequent classification phrase feature, which contain word order information. Furthermore some important factors, such as the separation of word, are also employed to improve the discriminative ability of sentiment orientation. Finally, the proposed method is applied to the sentiment analysis task experiment of the product reviews, and the results show that the text feature extraction has a better performance.

Key words: feature extraction, biclustering, frequent phrase feature, sentiment analysis

CLC Number: 

  • TP391
[1] PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques[C] // Proceedings of 2002 Conference on Empirical Methods in Natural Language Processing. Somerset: ACL, 2002: 79-86.
[2] TAN Songbo, ZHANG Jin. An empirical study of sentiment analysis for chinese documents[J]. Expert Systems with Applications, 2008, 34(4):2622-2629.
[3] SIVIC J, ZISSERMAN A. Efficient visual search of videos cast as text retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(4): 591-606.
[4] ZELLIG S. H. Distributional structure [J]. Word, 1954, 10(2-3):146-162.
[5] BEN-DOR A, CHOR B, KARP R, et al. Discovering local structure in gene expression data: the order-preserving submatrix problem[C] // Proceedings of the 6th Annual International Conference on Computational Biology(RECOMB '02). New York: ACM, 2002: 49-57.
[6] PEI Jian, HAN Jiawei, MORTAZAVI-ASL B, et al. Mining sequential patterns by pattern-growth: the prefixspan approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11):1424-1440.
[7] TAN Songbo. ChnSentiCorp[DB/OL].[2010-06-29]. http://www.nlpir.org/?action-viewnews-itemid-77.
[8] SALTON G, YU C. On the construction of effective vocabularies for information retrieval[J]. SIGPLAN Notices, 1975, 10(1):48-60.
[9] BENGIO Y, DUCHARME R, VINCENT Pascal, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(6):1137-1155.
[10] MNIH A, HINTON G E. A scalable hierarchical distributed language model[C] // Proceedings of the 21st International Conference on Neural Information Processing Systems(NIPS'08).[S.l.] : Curran Associates Inc, 2008: 1081-1088.
[11] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[J]. Computer Science, 2013. arXiv:1301.3781v3.
[12] TAI Kaisheng, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks[J]. Computer Science, 2015, 5(1):36.
[13] BOJANOWCKI P, GRAVE E, JOULIN A, et al. Enriching word vectors with subword information[EB/OL].[2017-03-15].http://arxiv.org/abs/1607.04606.
[14] KAUFMAN L, ROUSSEEUW P J. Finding groups in data: an introduction to cluster analysis[M]. New York: John Wiley & Sons, 2009.
[15] TÖRÖNEN P, KOLEHMAINEN M, WONG G, et al. Analysis of gene expression data using self-organizing maps[J]. Febs Letters, 1999, 451(2):142-146.
[16] KANG S H, SANDBERG B, YIP A M. A regularized k-means and multiphase scale segmentation[J]. Inverse Problems & Imaging, 2017, 5(2):407-429.
[17] CHENG Yinong, CHURCH G M. Biclustering of expression data[C] // Proceedings of International Society for Computational Biology.[S.l.] : AAAI Press, 2000: 93-103.
[18] KRIEGEL H P, ZIMEK A. Clustering high-dimensional data:a survey on subspace clustering, pattern-based clustering,and correlation clustering[J]. ACM Transactions on Knowledge Discovery from Data, 2009, 3(1):1-58.
[19] LAZZERONI L C, OWEN A. Plaid models for gene expression data[J]. Statistica Sinica, 2002: 61-86.
[20] MATSUMOTO S, TAKAMURA H, OKUMURA M. Sentiment classification using word sub-sequences and dependency sub-trees[C] // Proceedings of the 9th Pacific/Asia Conference on Knowledge Discovery and Data Mining. Berlin: Springer-Verlag, 2005: 301-311.
[21] LIU Zhiwen, XUE Yue, LI Meihang, et al. Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern mining[J]. International Journal of Data Mining & Bioinformatics, 2017, 17(3):217-237.
[22] WANG Hui. All common subsequences[C] // Proceedings of the International Joint Conference on Artificial Intelligence. Freiburg: IJCAI-INT, 2007: 635-640.
[23] LIU Yiqun, CHEN Fei, KONG Weize, et al. Identifying web spam with the wisdom of the crowds[J]. ACM Transactions on the Web, 2012, 6(1):1-30.
[24] ZHANG Huaping, YU Hongkui, XIONG Deyi, et al. HHMM-based chinese lexical analyzer ICTCLAS[C] // Sighan Workshop on Chinese Language Processing. Stroudsburg: Association for Computational Linguistics, 2003: 758-759.
[25] ZHANG Huaping. ICTCLAS[CP/OL].[2017-03-14]. http://ictclas.nlpir.org/.
[26] PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al. Scikit-learn: machine learning in Python[J]. Journal of Machine Learning Research, 2012, 12(10):2825-2830.
[1] CHEN Zhongyuan, LU Chong. Center moment discrepancy multimodal sentiment analysis based on self-attention mechanism [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2026, 61(3): 86-95.
[2] CHEN Yunfan, WANG Yechen, WANG Long, AN Qi, FENG Zeguo. Application of SERS collaborative machine learning in biomedical detection [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2025, 60(10): 23-41.
[3] Xia LIANG,Jie GUO. A method of online teaching platform selection based on online reviews [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(9): 108-118.
[4] Xueqiang ZENG,Yu SUN,Ye LIU,Zhongying WAN,Jiali ZUO,Mingwen WANG. Emoji embedded representation based on emotion distribution [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(3): 81-94.
[5] Jin-yu FAN,Yang ZOU,Jian XIONG,Yongyi GU. Imagedata control chart based on nonnegative CP tensor decomposition [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2024, 59(1): 27-34.
[6] Chan LU,Junjun GUO,Kaiwen TAN,Yan XIANG,Zhengtao YU. Multimodal sentiment analysis based on text-guided hierarchical adaptive fusion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(12): 31-40, 51.
[7] SU Zi-peng, YUAN Lei, LIU Peng, CHEN Xing-shu, LUO Yong-gang, CHEN Liang-guo. Research and implementation of real-time processing model of high-speed network stream [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2022, 57(9): 25-32.
[8] Zhe-jin DONG,Jian WANG,Ling-fei QIAN,Hong-fei LIN. A modeling method of user growth profile [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 38-45.
[9] Jie WU,Xiao-fei ZHU,Yi-hao ZHANG,Jian-wu LONG,Xian-ying HUANG,Wu YANG. User sentiment tendency aware based Micro-blog sentiment analysis method [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(3): 46-55.
[10] YU Chuan-ming, FENG Bo-lin, TIAN Xin, AN Lu. Deep representative learning based sentiment analysis in the cross-lingual environment [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2018, 53(3): 13-23.
[11] SUN Jian-dong, GU Xiu-sen, LI Yan, XU Wei-ran. Chinese entity relation extraction algorithms based on COAE2016 datasets [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(9): 7-12.
[12] SHI Han-xiao, LI Xiao-jun, HAO Teng-da, LIU Hong, ZHU Liu-qing. Emotion analysis on Microblog short text [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(7): 80-90.
[13] XU Ye, XU Wei-ran. Algorithm of knowledge base cumulative citation recommendation based on semantic features expansion [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 26-32.
[14] HE Yan-xiang, LIU Jian-bo, SUN Song-tao, WEN Wei-dong. Product reviews sentiment classification in Micro-blog based on cascaded conditional random field [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(11): 67-73.
[15] ZHU Zhu, LI Shou-shan, DAI Min, ZHOU Guo-dong. Opinion target extraction with active-learning and automatic annotation [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2015, 50(07): 38-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!