J4
• Articles • Previous Articles Next Articles
GU Feng,LIU Chen-xi,WU Yangyang
Received:
Revised:
Online:
Published:
Contact:
Abstract: Abstract: A method is proposed to select feature candidates from Chinese websites on the basis of sequential data mining, and it is used in the model of Chinese websites classification. This method uses improved PAT tree data structure to mine the frequent strings in the same class of Chinese websites, calculates the net frequency, mines frequent meaningful words, phrases, and English words from Chinese websites, and obtains text features with the help of the CHI algorithm. Experiments show that this algorithm not only mines most of the features selected by the traditional algorithm, but also mines some new meaningful personnames, placenames, new words, phrases, and foreign words.
Key words: chinese web page classification , frequent string, net frequency, pattree, sequential data mining
GU Feng,LIU Chen-xi,WU Yangyang . Chinese Web page feature selection method based on Sequential data mining[J].J4, 2006, 41(3): 95-99 .
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://lxbwk.njournal.sdu.edu.cn/EN/
http://lxbwk.njournal.sdu.edu.cn/EN/Y2006/V41/I3/95
Cited