基于模型决策树的AdaBoost算法

doi:10.6040/j.issn.1671-9352.2.2021.139

《山东大学学报(理学版)》 ›› 2023, Vol. 58 ›› Issue (1): 67-75.doi: 10.6040/j.issn.1671-9352.2.2021.139

• • 上一篇

基于模型决策树的AdaBoost算法

梁云¹,门昌骞¹,王文剑^2*

1.山西大学计算机与信息技术学院, 山西太原 030006;2.山西大学计算智能与中文信息处理教育部重点实验室, 山西太原 030006

发布日期:2023-02-12
作者简介:梁云(1997— ),女,硕士研究生,研究方向为机器学习. E-mail:2372570482@qq.com*通信作者简介:王文剑(1968— ),博士,教授,研究方向为机器学习、计算智能与图像处理. E-mail:wjwang@sxu.edu.cn
基金资助:
国家自然科学基金资助项目(62076154,U21A20513,U1805263);中央引导地方科技发展资金资助项目(YDZX20201400001224);山西省自然科学基金资助项目(201901D111030);山西省国际科技合作重点研发计划项目(201903D421050)

AdaBoost algorithm based on model decision tree

LIANG Yun¹, MEN Chang-qian¹, WANG Wen-jian^2*

1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China;
2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi, China

Published:2023-02-12

摘要/Abstract

摘要： AdaBoost算法是一种将多个基学习器通过合理策略结合生成强学习器的集成算法,其性能取决于基学习器的准确率和多样性。但弱学习器分类精度不高往往也导致了最终强分类器性能较差,因此进一步为了提高算法的分类精确率,本文提出一种MDTAda模型,首先利用基尼指数迭代构造一棵不完全决策树,然后在决策树的非纯伪叶结点上添加简单分类器,生成MDT(模型决策树),将MDT作为AdaBoost算法的基分类器,加权平均生成强分类器。在标准数据集上的实验表明,相比传统的AdaBoost算法,本文提出的算法拥有更好的泛化性能和更优的间隔分布,且在与AdaBoost算法达到相同精度时所需迭代次数更少。

关键词: 基尼指数, 决策树, 集成学习, AdaBoost算法, 间隔分析

Abstract: The AdaBoost algorithm is an ensemble algorithm that combines multiple base learners through reasonable strategies to generate a strong learner. Its performance depends on the accuracy and diversity of the base learners. However, the poor classification accuracy of weak learners often leads to poor performance of the final strong classifier. Therefore, in order to further improve the classification accuracy of the algorithm, this paper proposes an MDTAda model, which first uses Gini index to iteratively construct an incomplete decision tree. Then add a simple classifier to the non-pure pseudo-leaf nodes of the decision tree to generate MDT(model decision tree), use MDT as the base classifier of AdaBoost algorithm, and weighted average to generate a strong classifier. Experiments on standard data sets show that compared with the traditional AdaBoost algorithm, the algorithm proposed in this paper has better generalization performance and better margin distribution, and requires fewer iterations to achieve the same accuracy as AdaBoost algorithm.

Key words: Gini index, decision tree, ensemble learning, AdaBoost algorithm, margin analysis

中图分类号:

TP181

梁云,门昌骞,王文剑. 基于模型决策树的AdaBoost算法[J]. 《山东大学学报(理学版)》, 2023, 58(1): 67-75.

LIANG Yun, MEN Chang-qian, WANG Wen-jian. AdaBoost algorithm based on model decision tree[J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2023, 58(1): 67-75.

参考文献

[1] HAN J W, PEI J, KAMBER M. Data mining, concepts and techniques[M]. Amsterdam: Elsevier, 2011.
[2] QUINLAN J R. C4.5:programs for machine learning[M]. San Francisco: Morgan Kaufmann Publishers, 1993.
[3] FRIEDMAN J H, OLSHEN R A, STONE C J, et al. Classification and regression trees[M]. Florida: CRC Press, 1984.
[4] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[C] //Proceedings of the Second European Conference on Computational Learning Theory. Berlin:Springer-Verlag, 1995: 23-37.
[5] FREIEDMAN J H. Greedy function approximation: a gradientboosting machine[J]. The Annals of Statistics, 2001, 29(5): 1189-1232.
[6] CHEN T Q, GUESTRIN C. Xgboost: a scalable tree boosting system[C] //Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2016: 785-794.
[7] KE G L, MENG Q, FINLEY T, et al.LightGBM: a highly efficient gradient boosting decision tree[C] //31st Conference on Neural Information Processing Systems. Long Beach:NIPS, 2017:3146-3154.
[8] ZHOU Z H, JI F. Deep forest: towards an alternative to deep neural networks[C] //Twenty-sixth International Joint Conference on Artificial Intelligence. Sweden: IJCAI, 2017: 3553-3559.
[9] JIN X B, HOU X W, LIU C L. Multi-class AdaBoost with hypothesis margin[C] //2010 20th International Conference on Pattern Recognition. Piscataway: IEEE, 2010: 65-68.
[10] CAO J J, KWONG S, WANG R. A noise-detection based AdaBoost algorithm for mislabeled data[J]. Pattern Recognition, 2012, 45(12): 4451-4465.
[11] YAO X, WANG X D, ZHANG Y X, et al. A self-adaption ensemble algorithm based on random subspace and AdaBoost[J].Acta Electronica Sinica, 2013, 41(4): 810-814.
[12] SUN B, CHEN S, WANG J D, et al. A robust multi-class AdaBoost algorithm for mislabeled noisy data[J]. Knowledge-based Systems, 2016, 102: 87-102.
[13] WU R L, WANG L M, HU T Y. AdaBoost-SVM for electrical theft detection and GRNN for stealing time periods identification[C] //IECON 2018: 44th Annual Conference of the IEEE Industrial Electronics Society. Piscataway: IEEE, 2018: 3073-3078.
[14] SUN J, LI HUA, FUJITA H, et al. Class-imbalanced dynamic financial distress prediction based on AdaBoost-SVM ensemble combined with SMOTE and time weighting[J]. Information Fusion, 2020, 54: 128-144.
[15] LI K W, XIE P, ZHAI J N, et al. An improved AdaBoost algorithm for imbalanced data based on weighted KNN[C] //2017 IEEE 2nd International Conference on Big Data Analysis(ICBDA). Piscataway: IEEE, 2017: 30-34.
[16] YANG S, CHEN L F, YAN T, et al. An ensemble classification algorithm for convolutional neural network based on AdaBoost[C] //2017 IEEE/ACIS 16th International Conference on Computer and Information Science(ICIS). Piscataway: IEEE, 2017: 401-406.
[17] PUTRA M A, SETIAWAN N A, WIBIRAMA S, et al. Wart treatment method selection using AdaBoost with random forests as a weak learner[J]. Communications in Science and Technology, 2018, 3(2): 52-56.
[18] 尹儒,门昌骞,王文剑. 模型决策树:一种决策树加速算法[J]. 模式识别与人工智能, 2018, 31(7): 643-652. YIN Ru, MEN Changqian, WANG Wenjian. Model decision tree:an accelerated algorithm of decision tree[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(7):643-652.
[19] SCHAPIRE R E, FREUND Y, BARTLETT P, et al. Boosting the margin: a new explanation for the effectiveness of voting methods[J]. The Annals of Statistics, 1998, 26(5):1651-1686.
[20] BREIMAN L. Prediction games and arcing algorithms[J]. Neural Computation, 1999, 11(7):1493-1517.
[21] GAO W, ZHOU Z H. On the doubt about margin explanation of boosting[J]. Artificial Intelligence, 2013, 203(10):1-18.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于模型决策树的AdaBoost算法

AdaBoost algorithm based on model decision tree

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 8

多维度评价

本文评价

推荐阅读 0

[1]	廖祥文,徐阳,魏晶晶,杨定达,陈国龙. 基于双层堆叠分类模型的水军评论检测[J]. 《山东大学学报(理学版)》, 2019, 54(7): 57-67.
[2]	马丽菲,莫倩,杜辉. 面向中文短影评的分类技术研究[J]. 山东大学学报（理学版）, 2016, 51(1): 52-57.
[3]	张雯,张化祥*,李明方,计华. 决策树构建方法:向前两步优于一步[J]. J4, 2010, 45(7): 114-118.
[4]	陈雷. 关于通讯过程决策树的几点附注[J]. J4, 2009, 44(1): 33-39 .
[5]	陈雷 . 准同步通讯与泛同步通讯(Ⅱ)[J]. J4, 2008, 43(5): 32-38 .
[6]	何爱香,张勇 . 基于遗传算法和决策树的肿瘤分类规则挖掘[J]. J4, 2007, 42(9): 91-95 .
[7]	亓呈明,郝玲,崔守梅 . 一种新的模糊决策树模型及其应用[J]. J4, 2007, 42(11): 107-109 .
[8]	张华伟,王明文,甘丽新 . 基于随机森林的文本分类模型研究[J]. J4, 2006, 41(3): 139-143 .