J4

• Articles • Previous Articles     Next Articles

An improved k-means algorithm for document clustering

SUO Hong-guang1,2,WANG Yu-wei2   

  1. 1. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;2. School of Computer & Communication Engineering,China University of Petroleum, Dongying 257061, Shandong, China
  • Received:1900-01-01 Revised:1900-01-01 Online:2006-10-24 Published:2006-10-24
  • Contact: SUO Hong-guang

Abstract: The k-means algorithm is a popular method for document clustering, but it often gets stuck at a local maximum far from the optimal solution. A procedure based on local search was used to improve this algorithm. The formula about object function change was also deduced, which can be used to again partition the clustering. This procedure makes appropriate iterations to enlarge the search space. Theory analysis and experimental results show that the improved algorithm efficiently improves k-means clustering and its computation is also linear in the size of document collection.

Key words: local iteration , vector space model, k-means, document clustering

CLC Number: 

  • TP391
[1] FENG Xin-ying1,2, JI Hua1,2, ZHANG Hua-xiang1,2. Multi-label RBF neural networks learning algorithm  based on clustering optimization [J]. J4, 2012, 47(5): 63-67.
[2] XIE Juan-ying1, 2, ZHANG Yan1, XIE Wei-xin2, 3, GAO Xin-bo2. A novel rough K-means clustering algorithm based on the weight of density [J]. J4, 2010, 45(7): 1-6.
[3] ZHANG Xue-feng1, LIU Peng1,2. An improved K-means algorithm by weighted distance based on maximum between-cluster variation [J]. J4, 2010, 45(7): 28-33.
[4] WANG Wei-dong,SONG Dan,SONG Ren-jie . Web news retrieval based on splited vector space model [J]. J4, 2006, 41(3): 135-138 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!