J4

• Articles • Previous Articles     Next Articles

Chinese Web page feature selection method based on Sequential data mining

GU Feng,LIU Chen-xi,WU Yangyang   

  1. Department of computer science and technology, Huaqiao Univ., Quanzhou 362021, Fujian, China
  • Received:2006-03-29 Revised:1900-01-01 Online:2006-10-24 Published:2006-10-24
  • Contact: GU Feng

Abstract: Abstract: A method is proposed to select feature candidates from Chinese websites on the basis of sequential data mining, and it is used in the model of Chinese websites classification. This method uses improved PAT tree data structure to mine the frequent strings in the same class of Chinese websites, calculates the net frequency, mines frequent meaningful words, phrases, and English words from Chinese websites, and obtains text features with the help of the CHI algorithm. Experiments show that this algorithm not only mines most of the features selected by the traditional algorithm, but also mines some new meaningful personnames, placenames, new words, phrases, and foreign words.

Key words: chinese web page classification , frequent string, net frequency, pattree, sequential data mining

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!