Chinese Web page feature selection method based on Sequential data mining
- GU Feng,LIU Chen-xi,WU Yangyang
J4. 2006, 41(3):
Related Articles |
Abstract: A method is proposed to select feature candidates from Chinese websites on the basis of sequential data mining, and it is used in the model of Chinese websites classification. This method uses improved PAT tree data structure to mine the frequent strings in the same class of Chinese websites, calculates the net frequency, mines frequent meaningful words, phrases, and English words from Chinese websites, and obtains text features with the help of the CHI algorithm. Experiments show that this algorithm not only mines most of the features selected by the traditional algorithm, but also
mines some new meaningful personnames, placenames, new words, phrases, and foreign words.