This post is to share the two common non-parametric tests of comparing the survival functions: Log-Rank Test & Generalized Wilcoxon Test, as well as their corresponding calculations in the detailed process.
Assume we have two groups of samples, one with treatment and the other without treatment. Their survival functions are $S_{1}(t)$ and $S_{2}(t)$. We aim to test if these two groups' survival functions are different are not: $H_{0}:$ $S_{1}(t) = S_{2}(t)$.
If there is no censoring , we can use standard non-parametric methods, e.g., Wilcoxon test. However, with presense of censoring we have to use special non-parametric tests. Two common tests are Log-Rank Test and Generalized Wilcoxon Test.
Let's start with this this setting:
There are two important estimators, expected number of events and variance:
The expected events in the first group:
$\hat{E}_{t1} = \frac{R_{t1}}{R_{t}}(E_{t})$
At each event time t, the variance is
$Var(\hat{E}_{t1}) = \frac{R_{t1}R_{t2}E_{t}}{R_{t}^2 (R_{t} - 1)}(R_{t} - E_{t})$
With these two estimators, we can get the Log-Rank Test statistics as:
$\frac{\sum_{t} (E_{t1}-\hat{E}_{t1})}{\sqrt{\sum_{t}V_{t}}} \sim N(0,1)$
Or equivalently:
$\frac{(\sum_{t} E_{t1}-\hat{E}_{t1})^2}{\sum_{t}V_{t}} \sim \chi^2(1)$
The Generalized Wilcoxon Test is similar to the Log-Rank Test. One difference is that the Log-Rank Test is based on the differences of $E_{t1} - \hat{E}_{t1}$ and the variance $V_{t}$. Whereas the Generalized Wilcoxon Test uses a weighted version of the differences:
$b_{t} = R_{t}(E_{t1}-\hat{E}_{t1})$
$g_{t} = R_{t}^2V_{t}$
$\frac{(\sum_{t} b_{t})^2}{\sum_{t}g_{t}} \sim \chi^2(1)$
We must note that the risk set $R_{t}$ is large at the beginning of the study period and gets smaller towards the end. As a result, $E_{t1} - \hat{E}_{t1}$ gets larger weights at the beginning and smaller weights towards the end. Therefore, Log-Rank and Generalized Wilcoxon may lead to different results when the $E_{t1} - \hat{E}_{t1}$ are not uniform.
You may wonder when to use which test. This paper gives comprehensive explanations.
In summary, when you want to compare the survival curves of two independent groups:
The example data is from the SU survival analysis course. Before applying the survival
R package, I would like to first dementrate the manual calculation.
Months | Status | Group | Group1 | Group2 |
---|---|---|---|---|
7 | 1 | 1 | 1 | 0 |
14 | 1 | 1 | 1 | 0 |
24 | 1 | 1 | 1 | 0 |
27 | 1 | 1 | 1 | 0 |
27 | 1 | 1 | 1 | 0 |
50 | 1 | 1 | 1 | 0 |
51 | 1 | 1 | 1 | 0 |
56 | 1 | 1 | 1 | 0 |
58 | 1 | 1 | 1 | 0 |
60 | 1 | 1 | 1 | 0 |
62 | 1 | 1 | 1 | 0 |
69 | 0 | 1 | 1 | 0 |
71 | 1 | 1 | 1 | 0 |
74 | 1 | 1 | 1 | 0 |
74 | 1 | 1 | 1 | 0 |
76 | 1 | 1 | 1 | 0 |
80 | 0 | 1 | 1 | 0 |
80 | 0 | 1 | 1 | 0 |
80 | 0 | 1 | 1 | 0 |
88 | 0 | 1 | 1 | 0 |
93 | 1 | 1 | 1 | 0 |
98 | 1 | 1 | 1 | 0 |
104 | 0 | 1 | 1 | 0 |
1 | 1 | 2 | 0 | 1 |
1 | 1 | 2 | 0 | 1 |
2 | 1 | 2 | 0 | 1 |
8 | 1 | 2 | 0 | 1 |
9 | 1 | 2 | 0 | 1 |
9 | 1 | 2 | 0 | 1 |
14 | 1 | 2 | 0 | 1 |
17 | 1 | 2 | 0 | 1 |
20 | 1 | 2 | 0 | 1 |
27 | 1 | 2 | 0 | 1 |
34 | 1 | 2 | 0 | 1 |
43 | 1 | 2 | 0 | 1 |
45 | 1 | 2 | 0 | 1 |
47 | 1 | 2 | 0 | 1 |
55 | 1 | 2 | 0 | 1 |
56 | 1 | 2 | 0 | 1 |
57 | 0 | 2 | 0 | 1 |
62 | 1 | 2 | 0 | 1 |
64 | 1 | 2 | 0 | 1 |
78 | 0 | 2 | 0 | 1 |
82 | 0 | 2 | 0 | 1 |
86 | 0 | 2 | 0 | 1 |
92 | 0 | 2 | 0 | 1 |
Example data
The status
= 1 means not censored, status
= 0 means being censored.
Months
indicates the survival time.
Since the original data does not gave the count number of $R_{tj}$, we have to calculate it first.
library(tidyverse)
## survival time
event_time <- c()
## risk sets of group 1 and group 2
Rt1 <- c()
Rt2 <- c()
## i records survival time when one experiences event of interest
i <- 1
## j records the length of survival time vector
j <- 1
## the maximum of i is the largest survival time,
while (i <= max(two_group$Months)) {
## skip time = i if no occurrence of event
if (!i %in% two_group$Months) {
i <- i + 1
} else {
event_time[j] <- i
## count the risk set in group 1
Rt1[j] <- two_group %>%
filter(Group == 1) %>%
filter(Months >= event_time[j]) %>%
nrow()
## count the risk set in group 2
Rt2[j] <- two_group %>%
filter(Group == 2) %>%
filter(Months >= event_time[j]) %>%
nrow()
j <- j + 1
i <- i + 1
}
}
Rt_df <- tibble::tibble(Months = event_time, Rt1, Rt2)
knitr::kable(Rt_df, caption = "Risk sets at each survival time",align = "c")
Months | Rt1 | Rt2 |
---|---|---|
1 | 23 | 23 |
2 | 23 | 21 |
7 | 23 | 20 |
8 | 22 | 20 |
9 | 22 | 19 |
14 | 22 | 17 |
17 | 21 | 16 |
20 | 21 | 15 |
24 | 21 | 14 |
27 | 20 | 14 |
34 | 18 | 13 |
43 | 18 | 12 |
45 | 18 | 11 |
47 | 18 | 10 |
50 | 18 | 9 |
51 | 17 | 9 |
55 | 16 | 9 |
56 | 16 | 8 |
57 | 15 | 7 |
58 | 15 | 6 |
60 | 14 | 6 |
62 | 13 | 6 |
64 | 12 | 5 |
69 | 12 | 4 |
71 | 11 | 4 |
74 | 10 | 4 |
76 | 8 | 4 |
78 | 7 | 4 |
80 | 7 | 3 |
82 | 4 | 3 |
86 | 4 | 2 |
88 | 4 | 1 |
92 | 3 | 1 |
93 | 3 | 0 |
98 | 2 | 0 |
104 | 1 | 0 |
Risk sets at each survival time
## Find the observed events Et
Et_df <- two_group %>%
select(Months, Status, Group) %>%
arrange(Months) %>%
group_by(Months, Group) %>%
## get the count
summarise(n = sum(Status), .groups = "drop") %>%
## make the table in wide format
pivot_wider(
names_from = Group,
names_glue = "Observed.Group{Group}",
values_from = n,
values_fill = 0
)
knitr::kable(Et_df, caption = "Expected event at each survival time",align = "c")
Months | Observed.Group2 | Observed.Group1 |
---|---|---|
1 | 2 | 0 |
2 | 1 | 0 |
7 | 0 | 1 |
8 | 1 | 0 |
9 | 2 | 0 |
14 | 1 | 1 |
17 | 1 | 0 |
20 | 1 | 0 |
24 | 0 | 1 |
27 | 1 | 2 |
34 | 1 | 0 |
43 | 1 | 0 |
45 | 1 | 0 |
47 | 1 | 0 |
50 | 0 | 1 |
51 | 0 | 1 |
55 | 1 | 0 |
56 | 1 | 1 |
57 | 0 | 0 |
58 | 0 | 1 |
60 | 0 | 1 |
62 | 1 | 1 |
64 | 1 | 0 |
69 | 0 | 0 |
71 | 0 | 1 |
74 | 0 | 2 |
76 | 0 | 1 |
78 | 0 | 0 |
80 | 0 | 0 |
82 | 0 | 0 |
86 | 0 | 0 |
88 | 0 | 0 |
92 | 0 | 0 |
93 | 0 | 1 |
98 | 0 | 1 |
104 | 0 | 0 |
Expected event at each survival time
df = Et_df %>%
## join with the previously found risk set
left_join(Rt_df, by = "Months") %>%
mutate(
## the the total Rt = Rt1 + Rt2
Rt = Rt1 + Rt2,
.after = Months
) %>%
## calculate the expected event
mutate(
Expected_Et1 = (Rt1 / Rt) * (Observed.Group1 + Observed.Group2),
Expected_Et2 = (Rt2 / Rt) * (Observed.Group1 + Observed.Group2),
Et = Expected_Et1 + Expected_Et2,
Variance = (Rt1 * Rt2 * Et) / (Rt^2 * (Rt - 1)) * (Rt - Et),
bt = Rt * (Observed.Group1 - Expected_Et1),
gt = Rt^2 * Variance
)
knitr::kable(df, caption = "Data needed for tests",digits = 2,align = "c")
Months | Rt | Observed.Group2 | Observed.Group1 | Rt1 | Rt2 | Expected_Et1 | Expected_Et2 | Et | Variance | bt | gt |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 46 | 2 | 0 | 23 | 23 | 1.00 | 1.00 | 2 | 0.49 | -46 | 1034.49 |
2 | 44 | 1 | 0 | 23 | 21 | 0.52 | 0.48 | 1 | 0.25 | -23 | 483.00 |
7 | 43 | 0 | 1 | 23 | 20 | 0.53 | 0.47 | 1 | 0.25 | 20 | 460.00 |
8 | 42 | 1 | 0 | 22 | 20 | 0.52 | 0.48 | 1 | 0.25 | -22 | 440.00 |
9 | 41 | 2 | 0 | 22 | 19 | 1.07 | 0.93 | 2 | 0.48 | -44 | 815.10 |
14 | 39 | 1 | 1 | 22 | 17 | 1.13 | 0.87 | 2 | 0.48 | -5 | 728.32 |
17 | 37 | 1 | 0 | 21 | 16 | 0.57 | 0.43 | 1 | 0.25 | -21 | 336.00 |
20 | 36 | 1 | 0 | 21 | 15 | 0.58 | 0.42 | 1 | 0.24 | -21 | 315.00 |
24 | 35 | 0 | 1 | 21 | 14 | 0.60 | 0.40 | 1 | 0.24 | 14 | 294.00 |
27 | 34 | 1 | 2 | 20 | 14 | 1.76 | 1.24 | 3 | 0.68 | 8 | 789.09 |
34 | 31 | 1 | 0 | 18 | 13 | 0.58 | 0.42 | 1 | 0.24 | -18 | 234.00 |
43 | 30 | 1 | 0 | 18 | 12 | 0.60 | 0.40 | 1 | 0.24 | -18 | 216.00 |
45 | 29 | 1 | 0 | 18 | 11 | 0.62 | 0.38 | 1 | 0.24 | -18 | 198.00 |
47 | 28 | 1 | 0 | 18 | 10 | 0.64 | 0.36 | 1 | 0.23 | -18 | 180.00 |
50 | 27 | 0 | 1 | 18 | 9 | 0.67 | 0.33 | 1 | 0.22 | 9 | 162.00 |
51 | 26 | 0 | 1 | 17 | 9 | 0.65 | 0.35 | 1 | 0.23 | 9 | 153.00 |
55 | 25 | 1 | 0 | 16 | 9 | 0.64 | 0.36 | 1 | 0.23 | -16 | 144.00 |
56 | 24 | 1 | 1 | 16 | 8 | 1.33 | 0.67 | 2 | 0.43 | -8 | 244.87 |
57 | 22 | 0 | 0 | 15 | 7 | 0.00 | 0.00 | 0 | 0.00 | 0 | 0.00 |
58 | 21 | 0 | 1 | 15 | 6 | 0.71 | 0.29 | 1 | 0.20 | 6 | 90.00 |
60 | 20 | 0 | 1 | 14 | 6 | 0.70 | 0.30 | 1 | 0.21 | 6 | 84.00 |
62 | 19 | 1 | 1 | 13 | 6 | 1.37 | 0.63 | 2 | 0.41 | -7 | 147.33 |
64 | 17 | 1 | 0 | 12 | 5 | 0.71 | 0.29 | 1 | 0.21 | -12 | 60.00 |
69 | 16 | 0 | 0 | 12 | 4 | 0.00 | 0.00 | 0 | 0.00 | 0 | 0.00 |
71 | 15 | 0 | 1 | 11 | 4 | 0.73 | 0.27 | 1 | 0.20 | 4 | 44.00 |
74 | 14 | 0 | 2 | 10 | 4 | 1.43 | 0.57 | 2 | 0.38 | 8 | 73.85 |
76 | 12 | 0 | 1 | 8 | 4 | 0.67 | 0.33 | 1 | 0.22 | 4 | 32.00 |
78 | 11 | 0 | 0 | 7 | 4 | 0.00 | 0.00 | 0 | 0.00 | 0 | 0.00 |
80 | 10 | 0 | 0 | 7 | 3 | 0.00 | 0.00 | 0 | 0.00 | 0 | 0.00 |
82 | 7 | 0 | 0 | 4 | 3 | 0.00 | 0.00 | 0 | 0.00 | 0 | 0.00 |
86 | 6 | 0 | 0 | 4 | 2 | 0.00 | 0.00 | 0 | 0.00 | 0 | 0.00 |
88 | 5 | 0 | 0 | 4 | 1 | 0.00 | 0.00 | 0 | 0.00 | 0 | 0.00 |
92 | 4 | 0 | 0 | 3 | 1 | 0.00 | 0.00 | 0 | 0.00 | 0 | 0.00 |
93 | 3 | 0 | 1 | 3 | 0 | 1.00 | 0.00 | 1 | 0.00 | 0 | 0.00 |
98 | 2 | 0 | 1 | 2 | 0 | 1.00 | 0.00 | 1 | 0.00 | 0 | 0.00 |
104 | 1 | 0 | 0 | 1 | 0 | 0.00 | 0.00 | 0 | NaN | 0 | NaN |
Data needed for tests
total_df <- df %>%
summarise(across(Observed.Group2:gt, ~ sum(.x, na.rm = TRUE)))
knitr::kable(total_df, caption = "Total events and variance",digits = 2,align = "c" )
Observed.Group2 | Observed.Group1 | Rt1 | Rt2 | Expected_Et1 | Expected_Et2 | Et | Variance | bt | gt |
---|---|---|---|---|---|---|---|---|---|
18 | 17 | 502 | 320 | 22.35 | 12.65 | 35 | 7.49 | -209 | 7758.04 |
Total events and variance
$\begin{aligned} \frac{(\sum_{t} E_{t1}-\hat{E}_{t1})^2}{\sum_{t}V_{t}} & = \frac{(17-22.3536)^{2}}{7.48837} \\ & = 3.827406 \end{aligned}$
1 - pchisq(3.827406, df= 1)
## [1] 0.05042091
$\frac{(\sum_{t} b_{t})^2}{\sum_{t}g_{t}} \sim \chi^2(1)$
wilcoxon_test_statistics = (total_df$bt)^2 / total_df$gt
glue::glue("The Wilcoxon test statistics is: {wilcoxon_test_statistics} \n
The p-value is: { 1 - pchisq(wilcoxon_test_statistics, df= 1)}")
The Wilcoxon test statistics is: 5.63041359360399
The p-value is: 0.0176514662435399
survival
R packageOne can use survdiff
from the R package survival
to compare two
survival functions. A general process is below:
Surv
function to create survival time object.survfit
with survival time objectsurvdiff
with survivalrho
argument specifies whichlibrary(survival)
## create surv time object
survtime_obj <- Surv(two_group$Months, two_group$Status)
## fit the KM estimate
KM_estimate <- survfit(survtime_obj ~ Group, data = two_group)
## plot the KM
plot(KM_estimate)
## the log rank test
logrank <- survdiff(survtime_obj ~ Group, data = two_group, rho = 0) # log rank
print(logrank)
## Call:
## survdiff(formula = survtime_obj ~ Group, data = two_group, rho = 0)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=1 23 17 22.4 1.28 3.83
## Group=2 23 18 12.6 2.27 3.83
##
## Chisq= 3.8 on 1 degrees of freedom, p= 0.05
## the Wilcoxon test
wilcoxon<-survdiff(survtime_obj ~ Group, data = two_group,rho=1) # Wilcoxon
print(wilcoxon)
## Call:
## survdiff(formula = survtime_obj ~ Group, data = two_group, rho = 1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=1 23 8.84 13.35 1.52 5.49
## Group=2 23 13.28 8.77 2.32 5.49
##
## Chisq= 5.5 on 1 degrees of freedom, p= 0.02
The results above is similar to the manual calculation ones.
I hope you will find this post helpful in understanding the Log-Rank
Test and Generalized Wilcoxon test. Though in real life it is easy to
get the tests’ results by survival
package, it is good to know the
calculations under the hood. Please note that they are non-parametric
tests which means there is no distribution assumption on the survival
time.