您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2023, Vol. 58 ›› Issue (1): 67-75.doi: 10.6040/j.issn.1671-9352.2.2021.139

• • 上一篇    

基于模型决策树的AdaBoost算法

梁云1,门昌骞1,王文剑2*   

  1. 1.山西大学计算机与信息技术学院, 山西 太原 030006;2.山西大学计算智能与中文信息处理教育部重点实验室, 山西 太原 030006
  • 发布日期:2023-02-12
  • 作者简介:梁云(1997— ),女,硕士研究生,研究方向为机器学习. E-mail:2372570482@qq.com*通信作者简介:王文剑(1968— ),博士,教授,研究方向为机器学习、计算智能与图像处理. E-mail:wjwang@sxu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(62076154,U21A20513,U1805263);中央引导地方科技发展资金资助项目(YDZX20201400001224);山西省自然科学基金资助项目(201901D111030);山西省国际科技合作重点研发计划项目(201903D421050)

AdaBoost algorithm based on model decision tree

LIANG Yun1, MEN Chang-qian1, WANG Wen-jian2*   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi, China
  • Published:2023-02-12

摘要: AdaBoost算法是一种将多个基学习器通过合理策略结合生成强学习器的集成算法,其性能取决于基学习器的准确率和多样性。但弱学习器分类精度不高往往也导致了最终强分类器性能较差,因此进一步为了提高算法的分类精确率,本文提出一种MDTAda模型,首先利用基尼指数迭代构造一棵不完全决策树,然后在决策树的非纯伪叶结点上添加简单分类器,生成MDT(模型决策树),将MDT作为AdaBoost算法的基分类器,加权平均生成强分类器。在标准数据集上的实验表明,相比传统的AdaBoost算法,本文提出的算法拥有更好的泛化性能和更优的间隔分布,且在与AdaBoost算法达到相同精度时所需迭代次数更少。

关键词: 基尼指数, 决策树, 集成学习, AdaBoost算法, 间隔分析

Abstract: The AdaBoost algorithm is an ensemble algorithm that combines multiple base learners through reasonable strategies to generate a strong learner. Its performance depends on the accuracy and diversity of the base learners. However, the poor classification accuracy of weak learners often leads to poor performance of the final strong classifier. Therefore, in order to further improve the classification accuracy of the algorithm, this paper proposes an MDTAda model, which first uses Gini index to iteratively construct an incomplete decision tree. Then add a simple classifier to the non-pure pseudo-leaf nodes of the decision tree to generate MDT(model decision tree), use MDT as the base classifier of AdaBoost algorithm, and weighted average to generate a strong classifier. Experiments on standard data sets show that compared with the traditional AdaBoost algorithm, the algorithm proposed in this paper has better generalization performance and better margin distribution, and requires fewer iterations to achieve the same accuracy as AdaBoost algorithm.

Key words: Gini index, decision tree, ensemble learning, AdaBoost algorithm, margin analysis

中图分类号: 

  • TP181
[1] HAN J W, PEI J, KAMBER M. Data mining, concepts and techniques[M]. Amsterdam: Elsevier, 2011.
[2] QUINLAN J R. C4.5:programs for machine learning[M]. San Francisco: Morgan Kaufmann Publishers, 1993.
[3] FRIEDMAN J H, OLSHEN R A, STONE C J, et al. Classification and regression trees[M]. Florida: CRC Press, 1984.
[4] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[C] //Proceedings of the Second European Conference on Computational Learning Theory. Berlin:Springer-Verlag, 1995: 23-37.
[5] FREIEDMAN J H. Greedy function approximation: a gradientboosting machine[J]. The Annals of Statistics, 2001, 29(5): 1189-1232.
[6] CHEN T Q, GUESTRIN C. Xgboost: a scalable tree boosting system[C] //Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2016: 785-794.
[7] KE G L, MENG Q, FINLEY T, et al.LightGBM: a highly efficient gradient boosting decision tree[C] //31st Conference on Neural Information Processing Systems. Long Beach:NIPS, 2017:3146-3154.
[8] ZHOU Z H, JI F. Deep forest: towards an alternative to deep neural networks[C] //Twenty-sixth International Joint Conference on Artificial Intelligence. Sweden: IJCAI, 2017: 3553-3559.
[9] JIN X B, HOU X W, LIU C L. Multi-class AdaBoost with hypothesis margin[C] //2010 20th International Conference on Pattern Recognition. Piscataway: IEEE, 2010: 65-68.
[10] CAO J J, KWONG S, WANG R. A noise-detection based AdaBoost algorithm for mislabeled data[J]. Pattern Recognition, 2012, 45(12): 4451-4465.
[11] YAO X, WANG X D, ZHANG Y X, et al. A self-adaption ensemble algorithm based on random subspace and AdaBoost[J].Acta Electronica Sinica, 2013, 41(4): 810-814.
[12] SUN B, CHEN S, WANG J D, et al. A robust multi-class AdaBoost algorithm for mislabeled noisy data[J]. Knowledge-based Systems, 2016, 102: 87-102.
[13] WU R L, WANG L M, HU T Y. AdaBoost-SVM for electrical theft detection and GRNN for stealing time periods identification[C] //IECON 2018: 44th Annual Conference of the IEEE Industrial Electronics Society. Piscataway: IEEE, 2018: 3073-3078.
[14] SUN J, LI HUA, FUJITA H, et al. Class-imbalanced dynamic financial distress prediction based on AdaBoost-SVM ensemble combined with SMOTE and time weighting[J]. Information Fusion, 2020, 54: 128-144.
[15] LI K W, XIE P, ZHAI J N, et al. An improved AdaBoost algorithm for imbalanced data based on weighted KNN[C] //2017 IEEE 2nd International Conference on Big Data Analysis(ICBDA). Piscataway: IEEE, 2017: 30-34.
[16] YANG S, CHEN L F, YAN T, et al. An ensemble classification algorithm for convolutional neural network based on AdaBoost[C] //2017 IEEE/ACIS 16th International Conference on Computer and Information Science(ICIS). Piscataway: IEEE, 2017: 401-406.
[17] PUTRA M A, SETIAWAN N A, WIBIRAMA S, et al. Wart treatment method selection using AdaBoost with random forests as a weak learner[J]. Communications in Science and Technology, 2018, 3(2): 52-56.
[18] 尹儒,门昌骞,王文剑. 模型决策树:一种决策树加速算法[J]. 模式识别与人工智能, 2018, 31(7): 643-652. YIN Ru, MEN Changqian, WANG Wenjian. Model decision tree:an accelerated algorithm of decision tree[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(7):643-652.
[19] SCHAPIRE R E, FREUND Y, BARTLETT P, et al. Boosting the margin: a new explanation for the effectiveness of voting methods[J]. The Annals of Statistics, 1998, 26(5):1651-1686.
[20] BREIMAN L. Prediction games and arcing algorithms[J]. Neural Computation, 1999, 11(7):1493-1517.
[21] GAO W, ZHOU Z H. On the doubt about margin explanation of boosting[J]. Artificial Intelligence, 2013, 203(10):1-18.
[1] 廖祥文,徐阳,魏晶晶,杨定达,陈国龙. 基于双层堆叠分类模型的水军评论检测[J]. 《山东大学学报(理学版)》, 2019, 54(7): 57-67.
[2] 马丽菲,莫倩,杜辉. 面向中文短影评的分类技术研究[J]. 山东大学学报(理学版), 2016, 51(1): 52-57.
[3] 张雯,张化祥*,李明方,计华. 决策树构建方法:向前两步优于一步[J]. J4, 2010, 45(7): 114-118.
[4] 陈雷. 关于通讯过程决策树的几点附注[J]. J4, 2009, 44(1): 33-39 .
[5] 陈 雷 . 准同步通讯与泛同步通讯(Ⅱ)[J]. J4, 2008, 43(5): 32-38 .
[6] 何爱香,张 勇 . 基于遗传算法和决策树的肿瘤分类规则挖掘[J]. J4, 2007, 42(9): 91-95 .
[7] 亓呈明,郝 玲,崔守梅 . 一种新的模糊决策树模型及其应用[J]. J4, 2007, 42(11): 107-109 .
[8] 张华伟,王明文,甘丽新 . 基于随机森林的文本分类模型研究[J]. J4, 2006, 41(3): 139-143 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!