JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2014, Vol. 49 ›› Issue (09): 129-134.doi: 10.6040/j.issn.1671-9352.2.2014.327

Research of the text clustering based on LDA using in network public opinion analysis

WANG Shao-peng1, PENG Yan2, WANG Jie2   

  1. 1. College of Information Engineering, Capital Normal University, Beijing 100048, China;
    2. School of Management, Capital Normal University, Beijing 100089, China
  • Received:2014-06-24 Revised:2014-08-28 Online:2014-09-20 Published:2014-09-30

Abstract: For the problem that hidden information of the text may be ignored by the traditional text clustering algorithm based on words, a kind of text clustering algorithm based on the latent dirichlet allocation(LDA) topic model was proposed. The algorithm uses the TF-IDF algorithm and LDA topic model to calculate text similarity, through the cost function to determine the fusion coefficient of text similarity, through linear combination to get the similarity between texts and uses the F-measure value to evaluate the clustering result. In the constructing of the LDA model, the algorithm uses Gibbs sampling to estimate the parameter, and through the Bias statistical standard method to determine the optimal number of topics. Viewing from the accuracy and stability of clustering results, the simulation results show that the proposed algorithm has a better effect than the traditional text clustering algorithm.

Key words: topic model, LDA, TF-IDF, text similarity, network public opinion

