JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE) ›› 2023, Vol. 58 ›› Issue (1): 67-75.doi: 10.6040/j.issn.1671-9352.2.2021.139

Previous Articles    

AdaBoost algorithm based on model decision tree

LIANG Yun1, MEN Chang-qian1, WANG Wen-jian2*   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi, China
  • Published:2023-02-12

Abstract: The AdaBoost algorithm is an ensemble algorithm that combines multiple base learners through reasonable strategies to generate a strong learner. Its performance depends on the accuracy and diversity of the base learners. However, the poor classification accuracy of weak learners often leads to poor performance of the final strong classifier. Therefore, in order to further improve the classification accuracy of the algorithm, this paper proposes an MDTAda model, which first uses Gini index to iteratively construct an incomplete decision tree. Then add a simple classifier to the non-pure pseudo-leaf nodes of the decision tree to generate MDT(model decision tree), use MDT as the base classifier of AdaBoost algorithm, and weighted average to generate a strong classifier. Experiments on standard data sets show that compared with the traditional AdaBoost algorithm, the algorithm proposed in this paper has better generalization performance and better margin distribution, and requires fewer iterations to achieve the same accuracy as AdaBoost algorithm.

Key words: Gini index, decision tree, ensemble learning, AdaBoost algorithm, margin analysis

CLC Number: 

  • TP181
[1] HAN J W, PEI J, KAMBER M. Data mining, concepts and techniques[M]. Amsterdam: Elsevier, 2011.
[2] QUINLAN J R. C4.5:programs for machine learning[M]. San Francisco: Morgan Kaufmann Publishers, 1993.
[3] FRIEDMAN J H, OLSHEN R A, STONE C J, et al. Classification and regression trees[M]. Florida: CRC Press, 1984.
[4] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[C] //Proceedings of the Second European Conference on Computational Learning Theory. Berlin:Springer-Verlag, 1995: 23-37.
[5] FREIEDMAN J H. Greedy function approximation: a gradientboosting machine[J]. The Annals of Statistics, 2001, 29(5): 1189-1232.
[6] CHEN T Q, GUESTRIN C. Xgboost: a scalable tree boosting system[C] //Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2016: 785-794.
[7] KE G L, MENG Q, FINLEY T, et al.LightGBM: a highly efficient gradient boosting decision tree[C] //31st Conference on Neural Information Processing Systems. Long Beach:NIPS, 2017:3146-3154.
[8] ZHOU Z H, JI F. Deep forest: towards an alternative to deep neural networks[C] //Twenty-sixth International Joint Conference on Artificial Intelligence. Sweden: IJCAI, 2017: 3553-3559.
[9] JIN X B, HOU X W, LIU C L. Multi-class AdaBoost with hypothesis margin[C] //2010 20th International Conference on Pattern Recognition. Piscataway: IEEE, 2010: 65-68.
[10] CAO J J, KWONG S, WANG R. A noise-detection based AdaBoost algorithm for mislabeled data[J]. Pattern Recognition, 2012, 45(12): 4451-4465.
[11] YAO X, WANG X D, ZHANG Y X, et al. A self-adaption ensemble algorithm based on random subspace and AdaBoost[J].Acta Electronica Sinica, 2013, 41(4): 810-814.
[12] SUN B, CHEN S, WANG J D, et al. A robust multi-class AdaBoost algorithm for mislabeled noisy data[J]. Knowledge-based Systems, 2016, 102: 87-102.
[13] WU R L, WANG L M, HU T Y. AdaBoost-SVM for electrical theft detection and GRNN for stealing time periods identification[C] //IECON 2018: 44th Annual Conference of the IEEE Industrial Electronics Society. Piscataway: IEEE, 2018: 3073-3078.
[14] SUN J, LI HUA, FUJITA H, et al. Class-imbalanced dynamic financial distress prediction based on AdaBoost-SVM ensemble combined with SMOTE and time weighting[J]. Information Fusion, 2020, 54: 128-144.
[15] LI K W, XIE P, ZHAI J N, et al. An improved AdaBoost algorithm for imbalanced data based on weighted KNN[C] //2017 IEEE 2nd International Conference on Big Data Analysis(ICBDA). Piscataway: IEEE, 2017: 30-34.
[16] YANG S, CHEN L F, YAN T, et al. An ensemble classification algorithm for convolutional neural network based on AdaBoost[C] //2017 IEEE/ACIS 16th International Conference on Computer and Information Science(ICIS). Piscataway: IEEE, 2017: 401-406.
[17] PUTRA M A, SETIAWAN N A, WIBIRAMA S, et al. Wart treatment method selection using AdaBoost with random forests as a weak learner[J]. Communications in Science and Technology, 2018, 3(2): 52-56.
[18] 尹儒,门昌骞,王文剑. 模型决策树:一种决策树加速算法[J]. 模式识别与人工智能, 2018, 31(7): 643-652. YIN Ru, MEN Changqian, WANG Wenjian. Model decision tree:an accelerated algorithm of decision tree[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(7):643-652.
[19] SCHAPIRE R E, FREUND Y, BARTLETT P, et al. Boosting the margin: a new explanation for the effectiveness of voting methods[J]. The Annals of Statistics, 1998, 26(5):1651-1686.
[20] BREIMAN L. Prediction games and arcing algorithms[J]. Neural Computation, 1999, 11(7):1493-1517.
[21] GAO W, ZHOU Z H. On the doubt about margin explanation of boosting[J]. Artificial Intelligence, 2013, 203(10):1-18.
[1] Xiang-wen LIAO,Yang XU,Jing-jing WEI,Ding-da YANG,Guo-long CHEN. Review spam detection based on the two-level stacking classification model [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2019, 54(7): 57-67.
[2] MA Li-fei, MO Qian, DU Hui. Research on classification for Chinese short film reviews [J]. JOURNAL OF SHANDONG UNIVERSITY(NATURAL SCIENCE), 2016, 51(1): 52-57.
[3] ZHANG Wen, ZHANG Hua-xiang*, LI Ming-fang, JI Hua. A decision tree construction approach: two-step forward is better than one [J]. J4, 2010, 45(7): 114-118.
[4] CHEN Lei. Some notes on the communication process decision tree [J]. J4, 2009, 44(1): 33-39 .
[5] CHEN Lei . On para-communication and pan-communication (Ⅱ) [J]. J4, 2008, 43(5): 32-38 .
[6] HE Ai-xiang,ZHANG Yong . Classification rules for mining tumors and normal tissues using genetic algorithms and decision trees [J]. J4, 2007, 42(9): 91-95 .
[7] QI Cheng-ming,HAO Ling,CUI Shou-mei . A new fuzzy decision tree model and its application [J]. J4, 2007, 42(11): 107-109 .
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!