您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

《山东大学学报(理学版)》 ›› 2024, Vol. 59 ›› Issue (4): 108-116.doi: 10.6040/j.issn.1671-9352.0.2022.465

•   • 上一篇    下一篇

带治愈组右删失数据的模型平均研究

王淑影(),张亚男,程云飞,周丽芳*()   

  1. 长春工业大学数学与统计学院, 吉林 长春 130012
  • 收稿日期:2022-09-07 出版日期:2024-04-20 发布日期:2024-04-12
  • 通讯作者: 周丽芳 E-mail:wangshuying0601@163.com;2283568775@qq.com
  • 作者简介:王淑影(1990—),女,副教授,博士生导师,博士,研究方向为生物统计、数理统计. E-mail: wangshuying0601@163.com
  • 基金资助:
    吉林省自然科学基金优秀青年基金项目(20230101371JC)

Model averaging study with a cured group right-censored data

Shuying WANG(),Yanan ZHANG,Yunfei CHENG,Lifang ZHOU*()   

  1. School of Mathematics and Statistics, Changchun University of Technology, Changchun 130012, Jilin, China
  • Received:2022-09-07 Online:2024-04-20 Published:2024-04-12
  • Contact: Lifang ZHOU E-mail:wangshuying0601@163.com;2283568775@qq.com

摘要:

在已有生存分析研究中, 大多直接假设响应变量与指定协变量的模型形式, 进而估计协变量效应, 但当模型假设错误时, 对应的结论可能是错误的。因此, 为了避免指定协变量构建模型引起的不准确性, 考虑使用一种基于模型平均方法的加速失效时间模型来对带治愈组的右删失数据进行刻画。在极大似然估计的框架下, 采用基于信息准则的模型选择和模型平均方法进行统计推断研究。数值模拟结果显示, 在带治愈组的右删失数据下基于模型平均方法的加速失效时间(accelerated failure time, AFT)模型估计及预测精度高于模型选择方法。最后通过黑色素瘤临床试验数据的分析, 对所提方法的可行性和实用性进行验证。

关键词: 右删失数据, 模型平均, 混合治愈模型, 加速失效时间模型, 极大似然估计

Abstract:

In the existing survival analysis studies, most of them directly assume the model form of response variables and specified covariates, and then estimate the covariate effect. However, when the assumption of the model is wrong, the corresponding conclusion may be wrong. Under the right-censored data with a cured group, in order to avoid the inaccuracy caused by the construction of the model with specified covariates, an accelerated failure time model (AFT model) based on the model averaging method is proposed. In the framework of maximum likelihood estimation, model selection and model averaging based on information criteria are used for statistical inference. The numerical simulation results show that under the right-censored data with a cured group, the estimation and prediction accuracy of the AFT model based on the model averaging method is higher than that of model selection method. Finally, the feasibility and practicability of the proposed method are verified by analysis of trial data on melanoma clinical.

Key words: right-censored data, model averaging, mixture cure model, accelerated failure time model, maximum likelihood estimation

中图分类号: 

  • O212

表1

3组参数真值设置"

参数 n=200 n=400
η0 η0 β1 γ1 γ2 γ2 μ σ η0 η0 β1 γ1 γ2 γ2 μ σ
设置A 0.6 0.3 0.3 -0.5 0.3 -0.3 0.5 0.4 0.6 0.3 0.3 -0.5 0.3 -0.3 0.5 0.4
设置B 0.6 0.3 0.3 -0.5 -0.3 0.3 0.5 0.4 0.6 0.3 0.3 -0.5 -0.3 0.3 0.5 0.4
设置C 0.6 0.3 -0.3 -0.5 0.3 0.3 0.5 0.4 0.6 0.3 -0.3 -0.5 0.3 0.3 0.5 0.4

表2

设置A在2组样本量下5个感兴趣指标的EMS和EMSP"

评价指标 样本量 感兴趣指标 S-AIC AIC S-BIC BIC
EMS n=200 (a) 0.222 5 0.418 2 0.223 2 0.453 6
(b) 0.460 4 0.842 7 0.462 0 0.913 5
(c) 0.168 1 0.314 9 0.168 3 0.341 7
n=400 (a) 0.222 5 0.418 2 0.223 2 0.453 6
(b) 0.460 4 0.842 7 0.462 0 0.913 5
(c) 0.168 1 0.314 9 0.168 3 0.3417
EMSP n=200 (d) 0.000 4 0.000 4 0.000 4 0.000 4
(e) 0.000 2 0.000 2 0.000 2 0.000 2
n=400 (d) 0.000 2 0.000 2 0.000 2 0.000 2
(e) 0.000 1 0.000 1 0.000 1 0.000 1

表3

设置B在2组样本量下5个感兴趣指标的EMS和EMSP"

评价指标 样本量 感兴趣指标 S-AIC AIC S-BIC BIC
EMS n=200 (a) 0.229 4 0.434 8 0.229 8 0.466 1
(b) 0.474 9 0.876 3 0.475 8 0.938 9
(c) 0.173 3 0.327 2 0.173 3 0.350 8
n=400 (a) 0.110 8 0.326 9 0.111 0 0.361 2
(b) 0.229 3 0.657 1 0.229 6 0.725 7
(c) 0.083 7 0.245 8 0.083 7 0.271 7
EMSP n=200 (d) 0.000 4 0.000 5 0.000 4 0.000 5
(e) 0.000 2 0.000 2 0.000 2 0.000 2
n=400 (d) 0.000 2 0.000 2 0.000 2 0.000 2
(e) 0.000 1 0.000 1 0.000 1 0.000 1

表4

设置C在2组样本量下5个感兴趣指标的EMS和EMSP"

评价指标 样本量 感兴趣指标 S-AIC AIC S-BIC BIC
EMS n=200 (a) 0.225 1 0.413 4 0.226 0 0.456 0
(b) 0.466 3 0.833 2 0.468 2 0.918 4
(c) 0.170 2 0.311 3 0.170 5 0.343 5
n=400 (a) 0.114 7 0.354 1 0.115 0 0.388 2
(b) 0.237 1 0.711 2 0.237 7 0.779 7
(c) 0.086 7 0.266 1 0.086 7 0.2918
EMSP n=200 (d) 0.000 3 0.000 5 0.000 3 0.000 5
(e) 0.000 2 0.000 2 0.000 2 0.000 2
n=400 (d) 0.000 2 0.000 2 0.000 2 0.000 2
(e) 0.000 1 0.000 1 0.000 1 0.000 1

图1

试验组与对照组的Kaplan-Meier曲线"

表5

黑色素瘤临床试验数据中感兴趣参数的估计结果"

参数 估计 S-AIC S-BIC AIC/BIC
μ1 估计值(标准差) 0.740 3(0.392 7) 0.740 2(0.392 7) -0.448 5(0.254 2)
95%置信区间 (-0.029 5, 1.510 0) (-0.029 5, 1.510 0) (-0.946 8, 0.049 8)
μ2 估计值(标准差) 0.562 8(0.263 0) 0.562 9(0.263 0) 0.309 1(0.185 6)
95%置信区间 (0.047 3, 1.078 3) (0.047 4, 1.078 3) (-0.054 7, 0.672 9)
μ3 估计值(标准差) 0.002 3(0.008 1) 0.002 3(0.008 1)
95%置信区间 (-0.013 6, 0.018 2) (-0.013 6, 0.018 2)
μ4 估计值(标准差) -2.6250e-05(2.5502e-05) -4.5440e-06(5.6240e-06) -0.038 6(0.188 0)
95%置信区间 (-7.6333e-05, 2.3734e-05) (-1.5567e-05, 6.4800e-06) (-0.407 1, 0.330 0)
μ5 估计值(标准差) 1.305 4(0.663 9) 1.305 4(0.663 9) -0.178 0(0.627 9)
95%置信区间 (0.004 2, 2.606 6) (0.004 2, 2.606 6) (-1.408 7, 1.052 7)

表6

黑色素瘤患者生存概率的预测值"

样本数 指标 S-AIC S-BIC AIC BIC
160 均值 0.561 0 0.560 8 0.558 0 0.557 6
中位数 0.511 7 0.508 6 0.501 9 0.500 7
180 均值 0.562 3 0.562 1 0.559 4 0.558 9
中位数 0.514 8 0.511 5 0.502 9 0.501 8
200 均值 0.563 1 0.562 8 0.559 9 0.559 7
中位数 0.515 2 0.512 7 0.503 7 0.502 9
220 均值 0.563 5 0.562 9 0.559 9 0.559 9
中位数 0.516 0 0.512 9 0.503 8 0.503 0
240 均值 0.564 0 0.564 0 0.560 4 0.560 3
中位数 0.516 0 0.513 3 0.504 1 0.503 2
1 BOAG J W . Maximum likelihood estimates of the proportion of patients cured by cancer therapy[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1949, 11 (1): 15- 44.
doi: 10.1111/j.2517-6161.1949.tb00020.x
2 BERKSON J , GAGE R P . Survival curve for cancer patients following treatment[J]. Journal of the American Statistical Association, 1952, 47 (259): 501- 515.
doi: 10.1080/01621459.1952.10501187
3 PENG Y , DEAR K B G , DENHAM J W . A generalized F mixture model for cure rate estimation[J]. Statistics in Medicine, 1998, 17 (8): 813- 830.
doi: 10.1002/(SICI)1097-0258(19980430)17:8<813::AID-SIM775>3.0.CO;2-#
4 SY J P , TAYLOR J M G . Estimation in a Cox proportional hazards cure model[J]. Biometrics, 2000, 56 (1): 227- 236.
doi: 10.1111/j.0006-341X.2000.00227.x
5 PENG Y W , DEAR K B G . A nonparametric mixture model for cure rate estimation[J]. Biometrics, 2000, 56 (1): 237- 243.
doi: 10.1111/j.0006-341X.2000.00237.x
6 LI C S , TAYLOR J M G . A semi-parametric accelerated failure time cure model[J]. Statistics in Medicine, 2002, 21 (21): 3235- 3247.
doi: 10.1002/sim.1260
7 ZHANG J J , PENG Y W . A new estimation method for the semiparametric accelerated failure time mixture cure model[J]. Statistics in Medicine, 2007, 26 (16): 3157- 3171.
doi: 10.1002/sim.2748
8 BATES J M , GRANGER C W J . The combination of forecasts[J]. Journal of the Operational Research Society, 1969, 20 (4): 451- 468.
doi: 10.1057/jors.1969.103
9 BUCKLAND S , BURNHAM K , AUGUSTIN N . Model selection: an integral part of inference[J]. Biometrics, 1997, 53 (2): 603- 618.
doi: 10.2307/2533961
10 CLAESKENS G , HJORT N L . The focused information criterion[J]. Journal of the American Statistical Association, 2003, 98 (464): 900- 916.
doi: 10.1198/016214503000000819
11 HANSEN B E . Least squares model averaging[J]. Econometrica, 2007, 75 (4): 1175- 1189.
doi: 10.1111/j.1468-0262.2007.00785.x
12 ZHANG X Y , LIANG H . Focused information criterion and model averaging for generalized additive partial linear models[J]. The Annals of Statistics, 2011, 39 (1): 174- 200.
13 ZHANG X Y , ZOU G H , LIANG H . Model averaging and weight choice in linear mixed-effects models[J]. Biometrika, 2014, 101 (1): 205- 218.
doi: 10.1093/biomet/ast052
14 张翊, 王秀丽. 响应变量缺失下线性模型的模型平均[J]. 山东师范大学学报(自然科学版), 2020, 35 (2): 178- 182.
ZHANG Yi , WANG Xiuli . Model averaging procedure for linear model with missing responses[J]. Journal of Shandong Normal University (Natural Science), 2020, 35 (2): 178- 182.
15 胡国治, 程维虎, 曾婕. 协变量缺失下部分线性模型的模型选择和模型平均[J]. 应用数学学报, 2020, 43 (3): 535- 554.
HU Guozhi , CHENG Weihu , ZENG Jie . Model selection and model averaging for partially linear models with missing covariates[J]. Acta Mathematicae Applicatae Sinica, 2020, 43 (3): 535- 554.
16 DING X W , XIE J H , YAN X D . Model averaging for multiple quantile regression with covariates missing at random[J]. Journal of Statistical Computation and Simulation, 2021, 91 (11): 2249- 2275.
doi: 10.1080/00949655.2021.1890733
17 祝恒坤, 张海丽. 基于逆概率加权和插补的Mallows模型平均方法[J]. 系统科学与数学, 2022, 42 (4): 1032- 1059.
ZHU Hengkun , ZHANG Haili . Mallows model averaging based on inverse probability weighting and imputation[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42 (4): 1032- 1059.
18 王苗苗. 基于线性模型平均估计的置信区间[J]. 系统科学与数学, 2020, 40 (10): 1866- 1881.
WANG Miaomiao . Confidence interval based on model average estimator for linear regression[J]. Journal of Systems Science and Mathematical Sciences, 2020, 40 (10): 1866- 1881.
19 尹潇潇, 鲁筠, 石磊. Meta分析中的频率模型平均估计[J]. 数理统计与管理, 2021, 40 (2): 233- 241.
YIN Xiaoxiao , LU Jun , SHI Lei . Frequentist model averaging in Meta-analysis[J]. Journal of Applied Statistics and Management, 2021, 40 (2): 233- 241.
20 FENG Y , LIU Q , YAO Q , et al. Model averaging for nonlinear regression models[J]. Journal of Business & Economic Statistics, 2022, 40 (2): 785- 798.
21 LI J , YU T H , LYU J , et al. Semiparametric model averaging prediction for lifetime data via hazards regression[J]. Journal of the Royal Statistical Society Series C: Applied Statistics, 2021, 70 (5): 1187- 1209.
22 SCOLAS S , El GHOUCH A , LEGRAND C , et al. Variable selection in a flexible parametric mixture cure model with interval-censored data[J]. Statistics in Medicine, 2016, 35 (7): 1210- 1225.
23 HJORT N L , CLAESKENS G . Focused information criteria and model averaging for the Cox hazard regression model[J]. Journal of the American Statistical Association, 2006, 101 (476): 1449- 1464.
24 HJORT N L , CLAESKENS G . Frequentist model average estimators[J]. Journal of the American Statistical Association, 2003, 98 (464): 879- 899.
25 朱容, 邹国华. 半参数模型平均估计的渐近理论[J]. 中国科学(数学), 2018, 48 (8): 1019- 1052.
ZHU Rong , ZOU Guohua . The asymptotic theory for model averaging in general semiparametric models[J]. Scientia Sinica (Mathematica), 2018, 48 (8): 1019- 1052.
26 朱容, 邹国华, 张新雨. 部分函数线性模型的模型平均方法[J]. 系统科学与数学, 2018, 38 (7): 777- 800.
ZHU Rong , ZOU Guohua , ZHANG Xinyu . Optimal model averaging estimation for partial functional linear models[J]. Journal of Systems Science and Mathematical Sciences, 2018, 38 (7): 777- 800.
27 KIRKWOOD J M , STRAWDERMAN M H , ERNSTOFF M S , et al. Interferon Alfa-2b adjuvant therapy of high-risk resected cutaneous melanoma: the Eastern Cooperative Oncology Group Trial EST 1684[J]. Journal of Clinical Oncology, 1996, 14 (1): 7- 17.
28 BRADFORD P T , GOLDSTEIN A M , MCMASTER M L , et al. Acral lentiginous melanoma: incidence and survival patterns in the United States, 1986-2005[J]. Archives of Dermatology, 2009, 145 (4): 427- 434.
29 OMER M E A M E , ABU BAKAR M R , ADAM M B , et al. Cure models with exponentiated Weibull exponential distribution for the analysis of melanoma patients[J]. Mathematics, 2020, 8 (11): 1926.
[1] 赵玉环,张晓斌. 线性随机发展方程的极大似然估计[J]. 山东大学学报(理学版), 2014, 49(05): 54-60.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙守斌,孟广武,赵 峰 . 序同态的Dα-连续性[J]. J4, 2007, 42(7): 49 -53 .
[2] 郭亭,鲍晓明 . P137G点突变对嗜热细菌木糖异构酶酶活性及热稳定性的影响[J]. J4, 2006, 41(6): 145 -148 .
[3] 刁科凤,赵 平 . 具有最小连通点对图的C-超图的染色讨论[J]. J4, 2007, 42(2): 56 -58 .
[4] 薛岩波 杨波 陈贞翔. 小波分析在土木工程结构健康监测系统中的应用研究[J]. J4, 2009, 44(9): 28 -31 .
[5] 王 兵 . 拟无爪图的性质[J]. J4, 2007, 42(10): 111 -113 .
[6] 朱焱,侯建锋,王纪辉 . 图的粘合运算与韧度和孤立韧度的关系[J]. J4, 2006, 41(5): 59 -62 .
[7] 于少伟. 基于云理论的新的不确定性推理模型研究[J]. J4, 2009, 44(3): 84 -87 .
[8] 郭 磊,于瑞林,田发中 . 一类常规跳变系统的最优控制[J]. J4, 2006, 41(1): 35 -40 .
[9] 李曙光,杨振光,何志红 . 多纤波分复用链网与环网中的利润极大化问题[J]. J4, 2006, 41(5): 7 -11 .
[10] 刘修生 . C[a,b]上半范数的两个问题[J]. J4, 2006, 41(5): 84 -86 .