If we want to find the relationship between hazard function and other variables, we can use the Cox proportional hazards (PH) model. However, what if we want to model the survival time itself? In this case we can use Accelerated Failure-time (AFT) models.
The AFT models is used to model the effect of the covariates on the survival time. Its rudimentary mathmatics form is as below:
or equivalently
If we comepare with the Cox PH model
we can find that the interpretation of is different. For PH model, > 0 implies an increased hazed, shorter survial time; whereas it is opposite for AFT models.
If we introduce intercept term and a scale parameter to the model:
The original failture time is in the form of
The effect of the intercept and scale is to power the survival time. In other words, the effect of covariates in AFT is to affect the scale but not the location of a baseline survival time distribution.
Such model can also be rewritten more akin to linear regression model in the form of
Here comes to the key point of all "fuss" of doing all the transformation above.
We can further assume different distribution of the random error term by defining density function of the error term . It is also called extended generalized-gamma (EGG) model.
The advantage of setting density function in such way is that when we set different values for shape parameter q, we can get different distribution for the error term.
These 5 models above are special cases of the EGG model. Another model of interest, though not a special case of the EGG model, is the log-logistic model.
In R, we can In R we can do Weibull, Exponential, Gaussian, Logistic, Lognormal. However, reciprocal-Weibull distribution and EGG are not available in R unfortunately.
Understanding the setting above can also help us interpreting the R results properly.
In R we can use the survreg()
function in survival
package to estimate the AFT model.
If we have the example data as below (only printing the first 5 rows):
## # A tibble: 6 × 6
## Cluster BirCoh Residence Educ ExpMonths FirstBirth
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 3 0 27 0
## 2 1 4 3 0 11 1
## 3 1 4 3 0 63 0
## 4 1 7 3 0 5 1
## 5 1 1 3 0 4 1
## 6 1 7 3 0 29 1
Suppose now we are interested in modeling the effect of the three covariates, birth cohort (1 indicating the youngest cohort and 7 the oldest cohort), residence and education on the time until event.
Fit all possible models:
library(survival)
# create a survival object
birth_time <- Surv(df$ExpMonths, df$FirstBirth)
weibull_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ),
data = df, dist = "weibull")
summary(weibull_m)
##
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) +
## as.factor(Educ), data = df, dist = "weibull")
## Value Std. Error z p
## (Intercept) 3.17399 0.06548 48.47 < 2e-16
## as.factor(BirCoh)2 0.25444 0.05901 4.31 1.6e-05
## as.factor(BirCoh)3 0.45794 0.05733 7.99 1.4e-15
## as.factor(BirCoh)4 0.66664 0.05870 11.36 < 2e-16
## as.factor(BirCoh)5 0.77122 0.05768 13.37 < 2e-16
## as.factor(BirCoh)6 0.96452 0.06093 15.83 < 2e-16
## as.factor(BirCoh)7 1.14638 0.06049 18.95 < 2e-16
## as.factor(Residence)2 0.00821 0.04360 0.19 0.851
## as.factor(Residence)3 0.01026 0.04316 0.24 0.812
## as.factor(Educ)1 0.07568 0.03138 2.41 0.016
## as.factor(Educ)2 0.44000 0.03914 11.24 < 2e-16
## as.factor(Educ)3 0.59055 0.04497 13.13 < 2e-16
## as.factor(Educ)4 0.80628 0.11497 7.01 2.3e-12
## Log(scale) -0.14247 0.01040 -13.70 < 2e-16
##
## Scale= 0.867
##
## Weibull distribution
## Loglik(model)= -25484.4 Loglik(intercept only)= -25987.8
## Chisq= 1006.8 on 12 degrees of freedom, p= 6.5e-208
## Number of Newton-Raphson Iterations: 5
## n= 7074
exponential_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ),
data = df, dist = "exponential")
summary(exponential_m)
##
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) +
## as.factor(Educ), data = df, dist = "exponential")
## Value Std. Error z p
## (Intercept) 3.14259 0.07548 41.64 < 2e-16
## as.factor(BirCoh)2 0.25450 0.06804 3.74 0.00018
## as.factor(BirCoh)3 0.44630 0.06610 6.75 1.5e-11
## as.factor(BirCoh)4 0.64899 0.06766 9.59 < 2e-16
## as.factor(BirCoh)5 0.75716 0.06649 11.39 < 2e-16
## as.factor(BirCoh)6 0.94785 0.07023 13.50 < 2e-16
## as.factor(BirCoh)7 1.12131 0.06971 16.09 < 2e-16
## as.factor(Residence)2 0.00673 0.05032 0.13 0.89362
## as.factor(Residence)3 0.01391 0.04979 0.28 0.77988
## as.factor(Educ)1 0.09174 0.03616 2.54 0.01118
## as.factor(Educ)2 0.50488 0.04477 11.28 < 2e-16
## as.factor(Educ)3 0.68396 0.05141 13.30 < 2e-16
## as.factor(Educ)4 0.93782 0.13214 7.10 1.3e-12
##
## Scale fixed at 1
##
## Exponential distribution
## Loglik(model)= -25571.2 Loglik(intercept only)= -26008.6
## Chisq= 874.91 on 12 degrees of freedom, p= 1.4e-179
## Number of Newton-Raphson Iterations: 5
## n= 7074
gaussian_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ), data = df, dist = "gaussian")
summary(gaussian_m)
##
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) +
## as.factor(Educ), data = df, dist = "gaussian")
## Value Std. Error z p
## (Intercept) 23.15082 3.38398 6.84 7.8e-12
## as.factor(BirCoh)2 7.47071 3.07365 2.43 0.015
## as.factor(BirCoh)3 13.27059 2.98868 4.44 9.0e-06
## as.factor(BirCoh)4 21.00174 3.05946 6.86 6.7e-12
## as.factor(BirCoh)5 25.63538 3.00421 8.53 < 2e-16
## as.factor(BirCoh)6 36.20501 3.17033 11.42 < 2e-16
## as.factor(BirCoh)7 47.04573 3.15927 14.89 < 2e-16
## as.factor(Residence)2 0.05391 2.13862 0.03 0.980
## as.factor(Residence)3 0.40193 2.13717 0.19 0.851
## as.factor(Educ)1 3.67763 1.67694 2.19 0.028
## as.factor(Educ)2 19.95763 1.92443 10.37 < 2e-16
## as.factor(Educ)3 26.59970 2.16815 12.27 < 2e-16
## as.factor(Educ)4 34.57329 4.93346 7.01 2.4e-12
## Log(scale) 3.89056 0.00983 395.89 < 2e-16
##
## Scale= 48.9
##
## Gaussian distribution
## Loglik(model)= -28348.5 Loglik(intercept only)= -28719.8
## Chisq= 742.56 on 12 degrees of freedom, p= 3.4e-151
## Number of Newton-Raphson Iterations: 3
## n= 7074
logistic_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ),
data = df, dist = "logistic")
summary(logistic_m)
##
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) +
## as.factor(Educ), data = df, dist = "logistic")
## Value Std. Error z p
## (Intercept) 20.7588 2.6293 7.90 2.9e-15
## as.factor(BirCoh)2 6.6252 2.2948 2.89 0.0039
## as.factor(BirCoh)3 10.7559 2.2493 4.78 1.7e-06
## as.factor(BirCoh)4 16.1631 2.3335 6.93 4.3e-12
## as.factor(BirCoh)5 19.5924 2.2943 8.54 < 2e-16
## as.factor(BirCoh)6 27.4171 2.5039 10.95 < 2e-16
## as.factor(BirCoh)7 34.3870 2.5433 13.52 < 2e-16
## as.factor(Residence)2 0.1172 1.8009 0.07 0.9481
## as.factor(Residence)3 1.0303 1.7846 0.58 0.5637
## as.factor(Educ)1 2.6881 1.3469 2.00 0.0460
## as.factor(Educ)2 16.3524 1.6082 10.17 < 2e-16
## as.factor(Educ)3 22.7843 1.8429 12.36 < 2e-16
## as.factor(Educ)4 29.7262 4.3863 6.78 1.2e-11
## Log(scale) 3.1674 0.0118 267.36 < 2e-16
##
## Scale= 23.7
##
## Logistic distribution
## Loglik(model)= -27832.9 Loglik(intercept only)= -28151.1
## Chisq= 636.29 on 12 degrees of freedom, p= 1.9e-128
## Number of Newton-Raphson Iterations: 4
## n= 7074
lognormal_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ),
data = df, dist = "lognormal")
summary(lognormal_m)
##
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) +
## as.factor(Educ), data = df, dist = "lognormal")
## Value Std. Error z p
## (Intercept) 2.77519 0.07169 38.71 < 2e-16
## as.factor(BirCoh)2 0.31046 0.06530 4.75 2.0e-06
## as.factor(BirCoh)3 0.43949 0.06347 6.92 4.4e-12
## as.factor(BirCoh)4 0.58090 0.06495 8.94 < 2e-16
## as.factor(BirCoh)5 0.69226 0.06373 10.86 < 2e-16
## as.factor(BirCoh)6 0.85346 0.06717 12.71 < 2e-16
## as.factor(BirCoh)7 0.96719 0.06692 14.45 < 2e-16
## as.factor(Residence)2 0.00270 0.04521 0.06 0.95241
## as.factor(Residence)3 0.01486 0.04519 0.33 0.74223
## as.factor(Educ)1 0.08764 0.03555 2.46 0.01370
## as.factor(Educ)2 0.41984 0.04068 10.32 < 2e-16
## as.factor(Educ)3 0.58519 0.04585 12.76 < 2e-16
## as.factor(Educ)4 0.73771 0.10379 7.11 1.2e-12
## Log(scale) 0.03719 0.00991 3.75 0.00018
##
## Scale= 1.04
##
## Log Normal distribution
## Loglik(model)= -25413.2 Loglik(intercept only)= -25747.2
## Chisq= 667.95 on 12 degrees of freedom, p= 3.2e-135
## Number of Newton-Raphson Iterations: 3
## n= 7074
loglogistic_m <- survreg(birth_time ~ as.factor(BirCoh) + as.factor(Residence) + as.factor(Educ),
data = df, dist = "loglogistic")
summary(loglogistic_m)
##
## Call:
## survreg(formula = birth_time ~ as.factor(BirCoh) + as.factor(Residence) +
## as.factor(Educ), data = df, dist = "loglogistic")
## Value Std. Error z p
## (Intercept) 2.8516 0.0697 40.90 < 2e-16
## as.factor(BirCoh)2 0.2528 0.0629 4.02 5.8e-05
## as.factor(BirCoh)3 0.3682 0.0615 5.99 2.1e-09
## as.factor(BirCoh)4 0.5219 0.0632 8.25 < 2e-16
## as.factor(BirCoh)5 0.6217 0.0620 10.03 < 2e-16
## as.factor(BirCoh)6 0.7947 0.0659 12.07 < 2e-16
## as.factor(BirCoh)7 0.9377 0.0659 14.24 < 2e-16
## as.factor(Residence)2 -0.0050 0.0449 -0.11 0.911
## as.factor(Residence)3 0.0138 0.0446 0.31 0.756
## as.factor(Educ)1 0.0708 0.0347 2.04 0.041
## as.factor(Educ)2 0.4203 0.0403 10.43 < 2e-16
## as.factor(Educ)3 0.5763 0.0457 12.62 < 2e-16
## as.factor(Educ)4 0.7488 0.1060 7.06 1.6e-12
## Log(scale) -0.5306 0.0114 -46.68 < 2e-16
##
## Scale= 0.588
##
## Log logistic distribution
## Loglik(model)= -25397.8 Loglik(intercept only)= -25736.6
## Chisq= 677.55 on 12 degrees of freedom, p= 2.8e-137
## Number of Newton-Raphson Iterations: 4
## n= 7074
In the summary()
output you can see output of "Newton-Raphson". "Newton-Raphson" is the method used for maximizing the likelihood. More specifically:
where and is the corresponding survivor function. And "Newton-Raphson" is the method used for maximizing the likelihood for the equation above. One can see Allison (2010) about carrying out the algorithm in SAS.
I remember that at the beginning of learning the AFT models' mathematics part, I was wondering why do we need to make those many formula transformations and adding terms like scale and shape parameters. After applying the model in R I realize the necessity of understanding the derivation process. It is interesting to incorporate theory and application.