Bolin Wu

Survival Analysis 5: Accelerated Failure-time (AFT) model

2023-11-22 · 17 min read

If we want to find the relationship between hazard function and other variables, we can use the Cox proportional hazards (PH) model. However, what if we want to model the survival time itself? In this case we can use Accelerated Failure-time (AFT) models.

Introduction

The AFT models is used to model the effect of the covariates on the survival time. Its rudimentary mathmatics form is as below:

T=T0exp(zβ)T = T_{0}exp(z \beta)

or equivalently

lnT=lnT0+zβlnT = lnT_{0} + z \beta

  • T is a vector of survival times.
  • z is a vector of covariates.
  • β\beta is a vector of unknown regression parameters.
  • T0T_{0} is the survival time distribution at baseline, when covariates z are zero. In AFT model, the lnT0lnT_{0} is treated as random variable.

If we comepare with the Cox PH model

λ(tz)=λ0(t)exp(zX)\lambda (t|z) = \lambda_{0}(t) exp(z X)

we can find that the interpretation of β\beta is different. For PH model, β\beta > 0 implies an increased hazed, shorter survial time; whereas it is opposite for AFT models.

Model tuning

If we introduce intercept term α\alpha and a scale parameter δ\delta to the model:

lnT=α+zβ+δlnT0lnT = \alpha + z\beta + \delta lnT_{0}

The original failture time is in the form of

T=exp(α+zβ+δlnT0)=T0δexp(α)exp(zβ)T = exp(\alpha + z\beta + \delta lnT_{0}) = T_{0}^{\delta} exp(\alpha) exp (z\beta)

The effect of the intercept and scale is to power the survival time. In other words, the effect of covariates in AFT is to affect the scale but not the location of a baseline survival time distribution.

Such model can also be rewritten more akin to linear regression model in the form of

lnT=zβ+δϵlnT = z\beta + \delta \epsilon

  • where the intercept is incorporated in the coefficient vector β\beta.

Here comes to the key point of all "fuss" of doing all the transformation above.

We can further assume different distribution of the random error term by defining density function of the error term ϵ\epsilon. It is also called extended generalized-gamma (EGG) model.

  • When q = 0

f(q,ϵ)=12πexp(ϵ22)f(q, \epsilon) = \frac{1}{\sqrt{2 \pi}} exp (-\frac{\epsilon^{2}}{2})

  • When q =0\neq = 0

qΓ(q2)(q2)q2exp[q2(qϵexp(qϵ))]\frac{|q|}{\Gamma(q^{-2})} (q^{-2})^{q^{-2}} exp[q^{-2}(q \epsilon - exp(q \epsilon))]

The advantage of setting density function in such way is that when we set different values for shape parameter q, we can get different distribution for the error term.

  • When q = 0, T has a log-normal distribution.
  • When q = 1, T has a Weibull distribution.
  • When q = 1 and ϵ=1\epsilon = 1, T has a exponential distribution which is a special case of Weibull distribution.
  • When q = -1 and ϵ=1\epsilon = 1, T has a reciprocal-Weibull distribution.
  • When ϵ=1\epsilon = 1, T has a ordinary gamma distribution.

These 5 models above are special cases of the EGG model. Another model of interest, though not a special case of the EGG model, is the log-logistic model.

In R, we can In R we can do Weibull, Exponential, Gaussian, Logistic, Lognormal. However, reciprocal-Weibull distribution and EGG are not available in R unfortunately.

Understanding the setting above can also help us interpreting the R results properly.

Practice in R

In R we can use the survreg() function in survival package to estimate the AFT model.

If we have the example data as below (only printing the first 5 rows):

## # A tibble: 6 × 6
##   Cluster BirCoh Residence  Educ ExpMonths FirstBirth
##     <dbl>  <dbl>     <dbl> <dbl>     <dbl>      <dbl>
## 1       1      2         3     0        27          0
## 2       1      4         3     0        11          1
## 3       1      4         3     0        63          0
## 4       1      7         3     0         5          1
## 5       1      1         3     0         4          1
## 6       1      7         3     0        29          1

Suppose now we are interested in modeling the effect of the three covariates, birth cohort (1 indicating the youngest cohort and 7 the oldest cohort), residence and education on the time until event.

Fit all possible models:

library(survival)
# create a survival object
birth_time <- Surv(df$ExpMonths, df$FirstBirth)
  • Weibull distribution
weibull_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ), 
                     data = df, dist = "weibull")
summary(weibull_m)
## 
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) + 
##     as.factor(Educ), data = df, dist = "weibull")
##                          Value Std. Error      z       p
## (Intercept)            3.17399    0.06548  48.47 < 2e-16
## as.factor(BirCoh)2     0.25444    0.05901   4.31 1.6e-05
## as.factor(BirCoh)3     0.45794    0.05733   7.99 1.4e-15
## as.factor(BirCoh)4     0.66664    0.05870  11.36 < 2e-16
## as.factor(BirCoh)5     0.77122    0.05768  13.37 < 2e-16
## as.factor(BirCoh)6     0.96452    0.06093  15.83 < 2e-16
## as.factor(BirCoh)7     1.14638    0.06049  18.95 < 2e-16
## as.factor(Residence)2  0.00821    0.04360   0.19   0.851
## as.factor(Residence)3  0.01026    0.04316   0.24   0.812
## as.factor(Educ)1       0.07568    0.03138   2.41   0.016
## as.factor(Educ)2       0.44000    0.03914  11.24 < 2e-16
## as.factor(Educ)3       0.59055    0.04497  13.13 < 2e-16
## as.factor(Educ)4       0.80628    0.11497   7.01 2.3e-12
## Log(scale)            -0.14247    0.01040 -13.70 < 2e-16
## 
## Scale= 0.867 
## 
## Weibull distribution
## Loglik(model)= -25484.4   Loglik(intercept only)= -25987.8
##  Chisq= 1006.8 on 12 degrees of freedom, p= 6.5e-208 
## Number of Newton-Raphson Iterations: 5 
## n= 7074
  • Exponential distribution
exponential_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ),
                         data = df, dist = "exponential")
summary(exponential_m)
## 
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) + 
##     as.factor(Educ), data = df, dist = "exponential")
##                         Value Std. Error     z       p
## (Intercept)           3.14259    0.07548 41.64 < 2e-16
## as.factor(BirCoh)2    0.25450    0.06804  3.74 0.00018
## as.factor(BirCoh)3    0.44630    0.06610  6.75 1.5e-11
## as.factor(BirCoh)4    0.64899    0.06766  9.59 < 2e-16
## as.factor(BirCoh)5    0.75716    0.06649 11.39 < 2e-16
## as.factor(BirCoh)6    0.94785    0.07023 13.50 < 2e-16
## as.factor(BirCoh)7    1.12131    0.06971 16.09 < 2e-16
## as.factor(Residence)2 0.00673    0.05032  0.13 0.89362
## as.factor(Residence)3 0.01391    0.04979  0.28 0.77988
## as.factor(Educ)1      0.09174    0.03616  2.54 0.01118
## as.factor(Educ)2      0.50488    0.04477 11.28 < 2e-16
## as.factor(Educ)3      0.68396    0.05141 13.30 < 2e-16
## as.factor(Educ)4      0.93782    0.13214  7.10 1.3e-12
## 
## Scale fixed at 1 
## 
## Exponential distribution
## Loglik(model)= -25571.2   Loglik(intercept only)= -26008.6
##  Chisq= 874.91 on 12 degrees of freedom, p= 1.4e-179 
## Number of Newton-Raphson Iterations: 5 
## n= 7074
  • Gaussian
gaussian_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ), data = df, dist = "gaussian")
summary(gaussian_m)
## 
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) + 
##     as.factor(Educ), data = df, dist = "gaussian")
##                          Value Std. Error      z       p
## (Intercept)           23.15082    3.38398   6.84 7.8e-12
## as.factor(BirCoh)2     7.47071    3.07365   2.43   0.015
## as.factor(BirCoh)3    13.27059    2.98868   4.44 9.0e-06
## as.factor(BirCoh)4    21.00174    3.05946   6.86 6.7e-12
## as.factor(BirCoh)5    25.63538    3.00421   8.53 < 2e-16
## as.factor(BirCoh)6    36.20501    3.17033  11.42 < 2e-16
## as.factor(BirCoh)7    47.04573    3.15927  14.89 < 2e-16
## as.factor(Residence)2  0.05391    2.13862   0.03   0.980
## as.factor(Residence)3  0.40193    2.13717   0.19   0.851
## as.factor(Educ)1       3.67763    1.67694   2.19   0.028
## as.factor(Educ)2      19.95763    1.92443  10.37 < 2e-16
## as.factor(Educ)3      26.59970    2.16815  12.27 < 2e-16
## as.factor(Educ)4      34.57329    4.93346   7.01 2.4e-12
## Log(scale)             3.89056    0.00983 395.89 < 2e-16
## 
## Scale= 48.9 
## 
## Gaussian distribution
## Loglik(model)= -28348.5   Loglik(intercept only)= -28719.8
##  Chisq= 742.56 on 12 degrees of freedom, p= 3.4e-151 
## Number of Newton-Raphson Iterations: 3 
## n= 7074
  • Logistic
logistic_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ),
                      data = df, dist = "logistic")
summary(logistic_m)
## 
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) + 
##     as.factor(Educ), data = df, dist = "logistic")
##                         Value Std. Error      z       p
## (Intercept)           20.7588     2.6293   7.90 2.9e-15
## as.factor(BirCoh)2     6.6252     2.2948   2.89  0.0039
## as.factor(BirCoh)3    10.7559     2.2493   4.78 1.7e-06
## as.factor(BirCoh)4    16.1631     2.3335   6.93 4.3e-12
## as.factor(BirCoh)5    19.5924     2.2943   8.54 < 2e-16
## as.factor(BirCoh)6    27.4171     2.5039  10.95 < 2e-16
## as.factor(BirCoh)7    34.3870     2.5433  13.52 < 2e-16
## as.factor(Residence)2  0.1172     1.8009   0.07  0.9481
## as.factor(Residence)3  1.0303     1.7846   0.58  0.5637
## as.factor(Educ)1       2.6881     1.3469   2.00  0.0460
## as.factor(Educ)2      16.3524     1.6082  10.17 < 2e-16
## as.factor(Educ)3      22.7843     1.8429  12.36 < 2e-16
## as.factor(Educ)4      29.7262     4.3863   6.78 1.2e-11
## Log(scale)             3.1674     0.0118 267.36 < 2e-16
## 
## Scale= 23.7 
## 
## Logistic distribution
## Loglik(model)= -27832.9   Loglik(intercept only)= -28151.1
##  Chisq= 636.29 on 12 degrees of freedom, p= 1.9e-128 
## Number of Newton-Raphson Iterations: 4 
## n= 7074
  • Lognormal
lognormal_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ), 
                       data = df, dist = "lognormal")
summary(lognormal_m)
## 
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) + 
##     as.factor(Educ), data = df, dist = "lognormal")
##                         Value Std. Error     z       p
## (Intercept)           2.77519    0.07169 38.71 < 2e-16
## as.factor(BirCoh)2    0.31046    0.06530  4.75 2.0e-06
## as.factor(BirCoh)3    0.43949    0.06347  6.92 4.4e-12
## as.factor(BirCoh)4    0.58090    0.06495  8.94 < 2e-16
## as.factor(BirCoh)5    0.69226    0.06373 10.86 < 2e-16
## as.factor(BirCoh)6    0.85346    0.06717 12.71 < 2e-16
## as.factor(BirCoh)7    0.96719    0.06692 14.45 < 2e-16
## as.factor(Residence)2 0.00270    0.04521  0.06 0.95241
## as.factor(Residence)3 0.01486    0.04519  0.33 0.74223
## as.factor(Educ)1      0.08764    0.03555  2.46 0.01370
## as.factor(Educ)2      0.41984    0.04068 10.32 < 2e-16
## as.factor(Educ)3      0.58519    0.04585 12.76 < 2e-16
## as.factor(Educ)4      0.73771    0.10379  7.11 1.2e-12
## Log(scale)            0.03719    0.00991  3.75 0.00018
## 
## Scale= 1.04 
## 
## Log Normal distribution
## Loglik(model)= -25413.2   Loglik(intercept only)= -25747.2
##  Chisq= 667.95 on 12 degrees of freedom, p= 3.2e-135 
## Number of Newton-Raphson Iterations: 3 
## n= 7074
  • Loglogistic
loglogistic_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ), 
                         data = df, dist = "loglogistic")
summary(loglogistic_m)
## 
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) + 
##     as.factor(Educ), data = df, dist = "loglogistic")
##                         Value Std. Error      z       p
## (Intercept)            2.8516     0.0697  40.90 < 2e-16
## as.factor(BirCoh)2     0.2528     0.0629   4.02 5.8e-05
## as.factor(BirCoh)3     0.3682     0.0615   5.99 2.1e-09
## as.factor(BirCoh)4     0.5219     0.0632   8.25 < 2e-16
## as.factor(BirCoh)5     0.6217     0.0620  10.03 < 2e-16
## as.factor(BirCoh)6     0.7947     0.0659  12.07 < 2e-16
## as.factor(BirCoh)7     0.9377     0.0659  14.24 < 2e-16
## as.factor(Residence)2 -0.0050     0.0449  -0.11   0.911
## as.factor(Residence)3  0.0138     0.0446   0.31   0.756
## as.factor(Educ)1       0.0708     0.0347   2.04   0.041
## as.factor(Educ)2       0.4203     0.0403  10.43 < 2e-16
## as.factor(Educ)3       0.5763     0.0457  12.62 < 2e-16
## as.factor(Educ)4       0.7488     0.1060   7.06 1.6e-12
## Log(scale)            -0.5306     0.0114 -46.68 < 2e-16
## 
## Scale= 0.588 
## 
## Log logistic distribution
## Loglik(model)= -25397.8   Loglik(intercept only)= -25736.6
##  Chisq= 677.55 on 12 degrees of freedom, p= 2.8e-137 
## Number of Newton-Raphson Iterations: 4 
## n= 7074
  • Result
  1. All variables are significant except for the two residence levels.
  2. According to the estimated coefficients, for all birth cohort, the higher levels are, the longer time to the event.
  3. For residence, when level = 2, the time to event takes shorter; when level = 3, loglogistic model's time to event takes longer, others take shorter time to event still.
  4. For education, higher levels are , the longer time to the event. From edu level = 1 to edu level = 2, it has a big increase.

Model estimation

In the summary() output you can see output of "Newton-Raphson". "Newton-Raphson" is the method used for maximizing the likelihood. More specifically:

  • Consider survival times of n individuals to be t1t_{1}, t2t_{2}..., tnt_{n} and p covariates z1z_{1}, z2z_{2}, ...., zpz_{p}.
  • Let did_{i} = 0 if tit_{i} is consoring time and did_{i} = 1 if tit_{i} is time to the event.
  • The log-likelihood function ln(lnt;zβ,δ,q)ln(lnt;z\beta,\delta, q) assumes a noninformative consoring mechanism thus it is proportional to

indi[lnf(q,ϵi)lnδ]+i=1n(1di)lnS(q,ϵi)\sum_{i}^{n} d_{i} [lnf(q, \epsilon_{i}) - ln \delta] + \sum_{i = 1}^{n} (1 - d_{i}) lnS(q, \epsilon_{i})

where ϵi=lnTiziβδ\epsilon_{i} = \frac{lnT_{i} - z_{i}\beta}{\delta} and S(ϵ,q)S(\epsilon, q) is the corresponding survivor function. And "Newton-Raphson" is the method used for maximizing the likelihood for the equation above. One can see Allison (2010) about carrying out the algorithm in SAS.

Ending

I remember that at the beginning of learning the AFT models' mathematics part, I was wondering why do we need to make those many formula transformations and adding terms like scale and shape parameters. After applying the model in R I realize the necessity of understanding the derivation process. It is interesting to incorporate theory and application.

Prudence is a fountain of life to the prudent.