您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (3): 91-96.doi: 10.6040/j.issn.1671-9352.4.2016.080

• • 上一篇    下一篇

基于流形学习的代价敏感特征选择

黄天意,祝峰*   

  1. 闽南师范大学粒计算重点实验室, 福建 漳州 363000
  • 收稿日期:2016-06-01 出版日期:2017-03-20 发布日期:2017-03-20
  • 通讯作者: 祝峰(1962— ), 男,博士,教授,研究方向为人工智能、粗糙集、数据挖掘. E-mail:williamfengzhu@163.com E-mail:weather33@126.com
  • 作者简介:黄天意(1992— ),男,硕士研究生,研究方向为机器学习,数据挖掘. E-mail:weather33@126.com
  • 基金资助:
    国家自然科学基金面上资助项目(61379049);国家自然科学基金面上资助项目(61472406);福建省自然科学基金(2015J01269)

Cost-sensitive feature selection via manifold learning

HUANG Tian-yi, ZHU William*   

  1. Laboratory of Granular Computing, Minnan Normal University, Zhangzhou 363000, Fujian, China
  • Received:2016-06-01 Online:2017-03-20 Published:2017-03-20

摘要: 为了得到一个低误分类代价的特征子集,本文通过定义样本间的代价距离并将代价距离引入了现有的特征选择架构,把流形学习和代价敏感特征选择问题相结合得到了一个新的代价敏感特征选择方法,称之为基于流形学习的代价敏感特征选择算法。以前提出的代价敏感特征选择算法在选择特征的过程中只考虑到了特征与误分类代价的关系,并对特征一个一个的进行选择,而本文所提出的代价敏感特征选择算法同时考虑了特征与误分类代价的关系和特征之间内在的判别信息,从而提高了代价敏感特征选择效果。在六个现实世界数据集上的实验证明了本文所提出的算法效果优于现有的相关算法。

关键词: 代价敏感, 特征选择, 流形学习, 有监督学习

Abstract: In order to get a low-cost subset of original features, we define the cost-distance among the samples and joint it to existing feature selection framework. We combine manifold learning into cost-sensitive feature selection model and develop a corresponding method, namely, cost-sensitive feature selection via manifold learning(CFSM). Most previous cost-sensitive feature selection algorithms rank features individually and select features just using correlation the between the cost and the features. Our cost-sensitive feature selection algorithm selects features not only using the correlation the between the cost and the features but also using the discriminative information implied within data to improve the features selection performance. Experimental results on different real world datasets show the promising performance of CFSM outperforms the state-of-the-arts.

Key words: cost-sensitive, manifold learning, feature selection, supervised learning

中图分类号: 

  • O151.26
[1] SAITTA L. Machine learning — a technological roadmap[M]. Amsterdam: University of Amsterdam, 2001.
[2] FRASCA M, BASSIS S. Gene-disease prioritization through cost-Sensitive graph-based methodologies[C] //International Work-Conference on Bioinformatics and Biomedical Engineering. Berlin: Springer International Publishing, 2016:739-751.
[3] WEI Fan, STOLFO S J, ZHANG Jingdan, et al. Adacost: misclassification cost-sensitive boosting[C] //Sixteenth International Conference On Machine Learning. Burlington: Morgan Kaufmann Publishers Inc, 1999:97-105.
[4] TURNEY P D. Types of cost in inductive concept learning[C] //The Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning. S. l: s. n, 2002:15-21.
[5] LU Jiwen, TAN Y P. Cost-Sensitive subspace analysis and extensions for face recognition[J]. IEEE Transactions on Information Forensics and Security, 2013, 8(3):510-519.
[6] LU Jiwen, ZHOU Xiuzhuang, TAN Y P, et al. Cost-sensitive semi-supervised discriminant analysis for face recognition[J]. IEEE Transactions on Information Forensics and Security, 2012, 7(3):944-953.
[7] ZADROZNY B, ELKAN C. Learning and making decisions when costs and probabilities are both unknown[C] //Seventh Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. S. l: s. n, 2001:204-213.
[8] DOMINGOS P. MetaCost: a general method for making classifiers cost-sensitive[C] //Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining. S. l: s. n, 1999:155-164.
[9] MIAO Linsong, LIU Mingxia, ZHANG Daoqiang. Cost-sensitive feature selection with application in software defect prediction[C]. IEEE International Conference on Pattern Recognition, 2012:967-970.
[10] LU Jiwen, TAN Y P. Regularized locality preserving projections and its extensions for face recognition[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B(Cybernetics), 2009, 40(3): 958-963.
[11] BELKIN M, NIYOGI P. Laplacian eigenmaps and spectral techniques for embedding and clustering[J]. Advances in Neural Information Processing Systems, 2002, 14(6):585-591.
[12] CAI Deng, ZHANG Chiyuan, HE Xiaofei. Unsupervised feature selection for multi-cluster data[C] //ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington Dc: s. n, 2010:333-342.
[13] ZHANG Yin, ZHOU Zhihua. Cost-sensitive face recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(10): 1758-1769.
[14] SHI Jianbo, MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
[15] ROWEIS S T, SAUL L K. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323-2326.
[16] NIE Feiping, HUANG Heng, CAI Xiao, et al. Efficient and robust feature selection via joint l2, 1-norms minimization[C] //Advances in Neural Information Processing Systems 23: Conference on Neural Information Processing Systems 2010. Proceedings of a Meeting Held 6-9 December 2010. Vancouver: s. n, 2010:1813-1821.
[17] EFRON B, HASTIE T, JOHNSTONE I, et al. Least angle regression[J]. The Annals of Statistics, 2004, 32(2):407-499.
[18] ZHAO Hong, MIN Fan, ZHU W. Cost-sensitive feature selection of data with errors[J]. Journal of Applied Mathematics, Article ID, 2013, 754698: 18.
[19] ZHU Pengfei, ZUO Wangmeng, ZHANG Lei, et al. Unsupervised feature selection by regularized self-representation[J]. Pattern Recognition, 2015, 48(2): 438-446.
[1] 黄伟婷,赵红,祝峰. 代价敏感属性约简的自适应分治算法[J]. 山东大学学报(理学版), 2016, 51(8): 98-104.
[2] 万中英,王明文,左家莉,万剑怡. 结合全局和局部信息的特征选择算法[J]. 山东大学学报(理学版), 2016, 51(5): 87-93.
[3] 李钊,孙占全,李晓,李诚. 基于信息损失量的特征选择方法研究及应用[J]. 山东大学学报(理学版), 2016, 51(11): 7-12.
[4] 夏梦南, 杜永萍, 左本欣. 基于依存分析与特征组合的微博情感分析[J]. 山东大学学报(理学版), 2014, 49(11): 22-30.
[5] 郑妍, 庞琳, 毕慧, 刘玮, 程工. 基于情感主题模型的特征选择方法[J]. 山东大学学报(理学版), 2014, 49(11): 74-81.
[6] 张里博, 李华雄, 周献中, 黄兵. 人脸识别中的多粒度代价敏感三支决策[J]. 山东大学学报(理学版), 2014, 49(08): 48-57.
[7] 于然1,2,刘春阳3*,靳小龙1,王元卓1,程学旗1. 基于多视角特征融合的中文垃圾微博过滤[J]. J4, 2013, 48(11): 53-58.
[8] 杜世强1,石玉清2,王维兰1,马明1. 基于流形正则化判别的因子分解[J]. J4, 2013, 48(05): 63-69.
[9] 曾文赋1,黄添强1,2,李凯1,余养强1,郭躬德1,2. 基于调和平均测地线核的局部线性嵌入算法[J]. J4, 2010, 45(7): 55-59.
[10] 易超群,李建平,朱成文. 一种基于分类精度的特征选择支持向量机[J]. J4, 2010, 45(7): 119-121.
[11] 杨玉珍 刘培玉 朱振方 邱烨. 应用特征项分布信息的信息增益改进方法研究[J]. J4, 2009, 44(11): 48-51.
[12] 王宗利,刘希玉 . 一种基于流形的蚁群聚类算法[J]. J4, 2008, 43(11): 40-43 .
[13] 袁晓航,杜小勇 . iRIPPER——一种改进的基于规则学习的文本分类算法[J]. J4, 2007, 42(11): 66-68 .
[14] 余俊英,王明文,盛 俊 . 文本分类中的类别信息特征选择方法[J]. J4, 2006, 41(3): 144-148 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!