JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2018, Vol. 53 ›› Issue (3): 36-45.doi: 10.6040/j.issn.1671-9352.1.2017.093

Text feature extraction method for sentiment analysis based on order-preserving submatrix and frequent sequential pattern mining

CHEN Xin1,2, XUE Yun1,3*, LU Xin1, LI Wan-li1, ZHAO Hong-ya2, HU Xiao-hui1   

  1. 1. School of Physics and Telecommunication Engineering, South China Normal University, Guangdong 510006, Guangzhou, China;
    2. Shenzhen PolyTechnic, Shenzhen 518055, Guangdong, China;
    3. Guangdong Provincial Engineering Technology Research Center for Data Science, Guangdong 510006, Guangzhou, China
  • Received:2017-07-04 Online:2018-03-20 Published:2018-03-13

Abstract: Feature extraction is one of the key steps in text sentiment analysis, which is also the main factor that affects the result. According to the variant expression of online review, the synonyms TF-IDF(term frequency-inverse document frequency)weight vector is obtained based on the semantic similarity. Then in view of the different length of online review, the local patterns of the feature vectors are identified with OPSM(order-preserving submatrix)biclustering algorithm. We improve PrefixSpan algorithm to detect the frequent classification phrase feature, which contain word order information. Furthermore some important factors, such as the separation of word, are also employed to improve the discriminative ability of sentiment orientation. Finally, the proposed method is applied to the sentiment analysis task experiment of the product reviews, and the results show that the text feature extraction has a better performance.

Key words: feature extraction, biclustering, frequent phrase feature, sentiment analysis

CLC Number: 

  • TP391
