J4 ›› 2013, Vol. 48 ›› Issue (11): 99-104.

• Articles • Previous Articles     Next Articles

A space-time-efficient multi-category text categorization algorithm

LIU Wu-ying, YI Mian-zhu, ZHANG Xing   

  • Received:2013-09-02 Online:2013-11-20 Published:2013-11-25

Abstract:

Low space-time complexity is always the expected performance of multi-category text categorization algorithms. The investigation of token frequency distribution in the set of news documents validates that the token frequency distribution obeys the ubiquitous power law. According to the distribution property of power law, a novel data structure of multi-category token frequency index is designed and based on which a multi-category text categorization algorithm with low space-time complexity is propose. The experimental results on the TanCorp data set show that the proposed algorithm is space-time-efficient in the application of multi-category news document categorization.

Key words: multi-category text categorization; algorithm complexity; multi-category Token frequency index; power law; news document

CLC Number: 

  • TP391
[1] LIU Ya-hui1, 2, LIU Chun-yang3*, ZHANG Tie-ying1, CHENG Xue-qi1. An overview of graph indexing technology [J]. J4, 2013, 48(11): 44-52.
[2] YU Ran 1,2, LIU Chun-yang3*, JIN Xiao-long 1, WANG Yuan-zhuo 1, CHENG Xue-qi 1. Chinese spam microblog filtering based on the fusion of
multi-angle features
[J]. J4, 2013, 48(11): 53-58.
[3] ZHENG Jian-xing, ZHANG Bo-feng*, YUE Xiao-dong, CHENG Ze-yu. Research on themes recommendation in microblogging
scenario based on neighbor-user profile
[J]. J4, 2013, 48(11): 59-65.
[4] PENG Qing-xi, QIAN Tie-yun. Store review spam detection based on quantitative sentiment [J]. J4, 2013, 48(11): 66-72.
[5] HUANG Liang, DU Yong-ping. The method of latent friend recommendation based on the trust relations [J]. J4, 2013, 48(11): 73-79.
[6] ZHANG Nai-zhou1, CAO Wei 2, CHEN Ke-rui 1, LI Shi-jun3. A temporal-aware model for search engine [J]. J4, 2013, 48(11): 80-86.
[7] CHEN Ke-rui, PAN Jun. Multi-source data fusion based on the expand vector space model [J]. J4, 2013, 48(11): 87-92.
[8] FANG Zhi-jun, LIU Xin-yun, WU Shi-qian, ZHENG Wen-juan. The multi-scale retinex algorithm for image enhancement based on
sub-band weighting fusion
[J]. J4, 2013, 48(11): 93-98.
[9] LI Yu-Qian, LIU Lin, LI Jin-Bing. Superposition principle of gray histograms in video analysis [J]. J4, 2009, 44(11): 63-67.
[10] XIE Hua, LIN Chang-Yuan, LIN Xue-Fang. Onedirection rough relations and security of data communication [J]. J4, 2009, 44(9): 93-96.
[11] XU Jie-ping1, YIN Hong-yu1, FAN Zi-wen2. Study on cover songs identification based on phrase content [J]. J4, 2013, 48(7): 68-71.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!