JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2016, Vol. 51 ›› Issue (5): 87-93.doi: 10.6040/j.issn.1671-9352.1.2015.E17

Previous Articles     Next Articles

Feature selection combined with the global and local information(GLFS)

WAN Zhong-ying, WANG Ming-wen, ZUO Jia-li, WAN Jian-yi   

  1. School of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, Jiangxi, China
  • Received:2015-09-25 Online:2016-05-20 Published:2016-05-16

Abstract: Feature selection methods directly affect the effect of text categorization. Traditional feature selection algorithm is based on global approach, ignoring the influence of local features, and even makes a lot of training document has no features. Therefore, the paper proposed a feature selection algorithm combined with the ALOFT method, which unify the traditional globe features and contribution rate of a word to individual document to unify the global and local information(GLFS). Experimental results in the Reuters data set and Fudan data set show that the method can ensure that each document has a characteristic word and improve classification performance. Furthermore, the paper discussed the influence of the new method of feature weights to classification.

Key words: the global and local information, feature selection, text classification, ALOFT, feature weight

CLC Number: 

  • TP391
[1] 谭松波. 高性能文本分类算法研究[D].北京:中国科学院计算机研究所,2006. TAN Songbo. Research on high-performance text categorization[D]. Beijing: Institute of Computing Technology Chinese Academy of Sciences, 2006.
[2] Fabfizio Sebastiani. Machine learning in automated text categorization[J]. ACM Computing Surveys, 2002, 34(1):1-47.
[3] 尚文倩.文本分类及其相关技术研究[D].北京:北京交通大学,2007. SHANG Wenqian. Research on text categorization and technologies[D]. Beijing: Beijing Jiaotong University, 2007.
[4] 张玉芳,万斌候,熊忠阳.文本分类中的特征降维方法研究[J].计算机应用研究,2012,29(7):2541-2543. ZHANG Yufang, WAN Binhou, XIONG Zhongyang. Research on feature dimension reduction in text classification[J]. Application Research of Computers, 2012, 29(7):2541-2543.
[5] 郑俊飞.文本分类特征选择与分类算法的改进[D].西安:西安电子科技大学,2012. ZHENG Junfei. Improvement on feature selection and classification algorithm for text classification[D]. Xian: Xidian University, 2012.
[6] SANTANALEA L E A, DE OLIVEIRA D F, CANUTO A M P, et al. A comparative analysis of feature selection methods for ensembles with different combination methods[C] // Proceedings of Internation Joint Conference on Neural Networks. Piscataway: IEEE Press, 2007: 643-648.
[7] 郭颂,马飞.文本分类中信息增益特征选择算法的改进[J].计算机应用与软件, 2013(08):139-142. GUO Song, MA Fei. Improving the algorithm of information gain feature selection in text classification[J]. Computer Applications and Software, 2013(08):139-142.
[8] 辛竹,周亚建.文本分类中互信息特征选择方法的研究与算法改进[J].计算机应用,2013,33(S2):116-118, 152. XIN Zhu, ZHOU Yajian. Study and improvement of mutual information for feature selection in text categorization[J]. Journal of Computer Applications, 2013, 33(S2):116-118, 152.
[9] 成卫青,唐旋.一种基于改进互信息和信息熵的文本特征选择方法[J].南京邮电大学学报(自然科学版),2013, 33(5):63-68. CHENG Weiqing, TANG Xuan. A text feature selection method using the improved mutual information and information entropy[J]. Journal of Nanjing University of Posts and Telecommunications(Natural Science), 2013, 33(5):63-68.
[10] PINHEIRO R H W, CAVALCANTI G D C, CORREA R F, et al. A global-ranking local feature selection method for text categorization[J]. Original Research Article Expert Systems with Applications, 2012, 39(17):12851-12857.
[11] 胡改蝶.中文文本分类中特征选择方法的应用与研究[D].太原:太原理工大学,2011. HU Gaidie. Application and research of feature selection method in chinese text categorization[D]. Taiyuan: Taiyuan University of Technology, 2011.
[1] HUANG Tian-yi, ZHU William. Cost-sensitive feature selection via manifold learning [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2017, 52(3): 91-96.
[2] LI Zhao,SUN Zhan-,LI Xiao,LI Cheng,. Study on feature selection method based on information loss [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(11): 7-12.
[3] MA Cheng-long, JIANG Ya-song, LI Yan-ling, ZHANG Yan, YAN Yong-hong. Short text classification based on word embedding similarity [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(12): 18-22.
[4] XIA Meng-nan, DU Yong-ping, ZUO Ben-xin. Micro-blog opinion analysis based on syntactic dependency and feature combination [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 22-30.
[5] ZHENG Yan, PANG Lin, BI Hui, LIU Wei, CHENG Gong. Feature selection algorithm based on sentiment topic model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(11): 74-81.
[6] PAN Qing-qing, ZHOU Feng, YU Zheng-tao, GUO Jian-yi, XIAN Yan-tuan. Recognition method of Vietnamese named entity based on#br# conditional random fields [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2014, 49(1): 76-79.
[7] YU Ran 1,2, LIU Chun-yang3*, JIN Xiao-long 1, WANG Yuan-zhuo 1, CHENG Xue-qi 1. Chinese spam microblog filtering based on the fusion of
multi-angle features
[J]. J4, 2013, 48(11): 53-58.
[8] YI Chao-qun, LI Jian-ping, ZHU Cheng-wen. A kind of feature selection based on classification accuracy of SVM [J]. J4, 2010, 45(7): 119-121.
[9] YANG Yu-Zhen, LIU Pei-Yu, SHU Zhen-Fang, QIU Ye. Research of an improved information gain methodusing distribution information of terms [J]. J4, 2009, 44(11): 48-51.
[10] YUAN Xiao-hang,DU Xiao-yong . iRIPPER: an improved rule-based text categorization algorithm [J]. J4, 2007, 42(11): 66-68 .
[11] ZHANG Wei-hua,WANG Ming-wen,GAN Li-xin . Automatic text classification model based on random forest [J]. J4, 2006, 41(3): 139-143 .
[12] YUAN Fang,YUAN Jun-ying . Naive Bayes Chinese text classification based on core words of class [J]. J4, 2006, 41(3): 46-49 .
[13] BAI Ru-jiang,WANG Xiao-yue . AA hybrid classifier based on the rough sets and BPneural networks [J]. J4, 2006, 41(3): 70-75 .
[14] WAN Hai-ping,HE Hua-can,ZHOU Yan-quan . Locality preserving kernel method and its application [J]. J4, 2006, 41(3): 18-20 .
[15] YU Jun-ying,WANG Ming-wen,SHENG Jun . Class information feature selection method for text classification [J]. J4, 2006, 41(3): 144-148 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!