Bolin Wu

Survival Analysis 3: Non-Parametric Comparison of Survival Functions

2023-07-27 · 19 min read
survival_analysis R

This post is to share the two common non-parametric tests of comparing the survival functions: Log-Rank Test & Generalized Wilcoxon Test, as well as their corresponding calculations in the detailed process.

Introduction

Assume we have two groups of samples, one with treatment and the other without treatment. Their survival functions are S1(t)S_{1}(t) and S2(t)S_{2}(t). We aim to test if these two groups' survival functions are different are not: H0:H_{0}: S1(t)=S2(t)S_{1}(t) = S_{2}(t).

If there is no censoring , we can use standard non-parametric methods, e.g., Wilcoxon test. However, with presense of censoring we have to use special non-parametric tests. Two common tests are Log-Rank Test and Generalized Wilcoxon Test.

The Log-Rank Test

Let's start with this this setting:

  • Two groups
  • RtjR_{tj} = the count of samples exposed to risk at time t in the jthj^{th} group, j = 1 or 2.
  • EtjE_{tj} = the count of observed events at time t in the jthj^{th} group.
  • EtE_{t} = Et1+Et2E_{t1} + E_{t2}
  • RtR_{t} = Rt1+Rt2R_{t1} + R_{t2}

There are two important estimators, expected number of events and variance:

The expected events in the first group:

E^t1=Rt1Rt(Et)\hat{E}_{t1} = \frac{R_{t1}}{R_{t}}(E_{t})

At each event time t, the variance is

Var(E^t1)=Rt1Rt2EtRt2(Rt1)(RtEt)Var(\hat{E}_{t1}) = \frac{R_{t1}R_{t2}E_{t}}{R_{t}^2 (R_{t} - 1)}(R_{t} - E_{t})

With these two estimators, we can get the Log-Rank Test statistics as:

t(Et1E^t1)tVtN(0,1)\frac{\sum_{t} (E_{t1}-\hat{E}_{t1})}{\sqrt{\sum_{t}V_{t}}} \sim N(0,1)

Or equivalently:

(tEt1E^t1)2tVtχ2(1)\frac{(\sum_{t} E_{t1}-\hat{E}_{t1})^2}{\sum_{t}V_{t}} \sim \chi^2(1)

Generalized Wilcoxon Test

The Generalized Wilcoxon Test is similar to the Log-Rank Test. One difference is that the Log-Rank Test is based on the differences of Et1E^t1E_{t1} - \hat{E}_{t1} and the variance VtV_{t}. Whereas the Generalized Wilcoxon Test uses a weighted version of the differences:

  • A weighted difference

bt=Rt(Et1E^t1)b_{t} = R_{t}(E_{t1}-\hat{E}_{t1})

  • A weighted variance

gt=Rt2Vtg_{t} = R_{t}^2V_{t}

  • Thus the test statistics becomes

(tbt)2tgtχ2(1)\frac{(\sum_{t} b_{t})^2}{\sum_{t}g_{t}} \sim \chi^2(1)

We must note that the risk set RtR_{t} is large at the beginning of the study period and gets smaller towards the end. As a result, Et1E^t1E_{t1} - \hat{E}_{t1} gets larger weights at the beginning and smaller weights towards the end. Therefore, Log-Rank and Generalized Wilcoxon may lead to different results when the Et1E^t1E_{t1} - \hat{E}_{t1} are not uniform.

You may wonder when to use which test. This paper gives comprehensive explanations.

In summary, when you want to compare the survival curves of two independent groups:

  • Use the log-rank test when you assume proportional hazards;
  • Use the generalized Wilcoxon test if the proportional hazard assumption is violated and you want a test of more weight at the beginning stage;
  • If you suspect the proportional hazards assumption is violated, but the separation pattern of the two curves is late, you can use the log-rank test. This suggestion may appear to be controversial. Please see the explanation for the table 6 in the paper mentioned above.

Exercise

The example data is from the SU survival analysis course. Before applying the survival R package, I would like to first dementrate the manual calculation.

Months Status Group Group1 Group2
7 1 1 1 0
14 1 1 1 0
24 1 1 1 0
27 1 1 1 0
27 1 1 1 0
50 1 1 1 0
51 1 1 1 0
56 1 1 1 0
58 1 1 1 0
60 1 1 1 0
62 1 1 1 0
69 0 1 1 0
71 1 1 1 0
74 1 1 1 0
74 1 1 1 0
76 1 1 1 0
80 0 1 1 0
80 0 1 1 0
80 0 1 1 0
88 0 1 1 0
93 1 1 1 0
98 1 1 1 0
104 0 1 1 0
1 1 2 0 1
1 1 2 0 1
2 1 2 0 1
8 1 2 0 1
9 1 2 0 1
9 1 2 0 1
14 1 2 0 1
17 1 2 0 1
20 1 2 0 1
27 1 2 0 1
34 1 2 0 1
43 1 2 0 1
45 1 2 0 1
47 1 2 0 1
55 1 2 0 1
56 1 2 0 1
57 0 2 0 1
62 1 2 0 1
64 1 2 0 1
78 0 2 0 1
82 0 2 0 1
86 0 2 0 1
92 0 2 0 1

Example data

  • The status = 1 means not censored, status = 0 means being censored.

  • Months indicates the survival time.

Since the original data does not gave the count number of RtjR_{tj}, we have to calculate it first.

library(tidyverse)

## survival time
event_time <- c()
## risk sets of group 1 and group 2
Rt1 <- c()
Rt2 <- c()
## i records survival time when one experiences event of interest
i <- 1
## j records the length of survival time vector
j <- 1
## the maximum of i is the largest survival time,
while (i <= max(two_group$Months)) {
  ## skip time = i if no occurrence of event
  if (!i %in% two_group$Months) {
    i <- i + 1
  } else {
    event_time[j] <- i
    ## count the risk set in group 1
    Rt1[j] <- two_group %>%
      filter(Group == 1) %>%
      filter(Months >= event_time[j]) %>%
      nrow()
    ## count the risk set in group 2
    Rt2[j] <- two_group %>%
      filter(Group == 2) %>%
      filter(Months >= event_time[j]) %>%
      nrow()
    j <- j + 1
    i <- i + 1
  }
}

Rt_df <- tibble::tibble(Months = event_time, Rt1, Rt2)
knitr::kable(Rt_df, caption = "Risk sets at each survival time",align = "c")
Months Rt1 Rt2
1 23 23
2 23 21
7 23 20
8 22 20
9 22 19
14 22 17
17 21 16
20 21 15
24 21 14
27 20 14
34 18 13
43 18 12
45 18 11
47 18 10
50 18 9
51 17 9
55 16 9
56 16 8
57 15 7
58 15 6
60 14 6
62 13 6
64 12 5
69 12 4
71 11 4
74 10 4
76 8 4
78 7 4
80 7 3
82 4 3
86 4 2
88 4 1
92 3 1
93 3 0
98 2 0
104 1 0

Risk sets at each survival time

  • Then we have to count the occurrences of event EtjE_{tj}.
## Find the observed events Et
Et_df <- two_group %>%
  select(Months, Status, Group) %>%
  arrange(Months) %>%
  group_by(Months, Group) %>%
  ## get the count
  summarise(n = sum(Status), .groups = "drop") %>%
  ## make the table in wide format
  pivot_wider(
    names_from = Group,
    names_glue = "Observed.Group{Group}",
    values_from = n,
    values_fill = 0
  )
knitr::kable(Et_df, caption = "Expected event at each survival time",align = "c")
Months Observed.Group2 Observed.Group1
1 2 0
2 1 0
7 0 1
8 1 0
9 2 0
14 1 1
17 1 0
20 1 0
24 0 1
27 1 2
34 1 0
43 1 0
45 1 0
47 1 0
50 0 1
51 0 1
55 1 0
56 1 1
57 0 0
58 0 1
60 0 1
62 1 1
64 1 0
69 0 0
71 0 1
74 0 2
76 0 1
78 0 0
80 0 0
82 0 0
86 0 0
88 0 0
92 0 0
93 0 1
98 0 1
104 0 0

Expected event at each survival time

  • Join the two tables together, find the E^t1\hat{E}_{t1} and E^t2\hat{E}_{t2}, variance, gtg_{t} and btb_{t}.
df = Et_df %>%
  ## join with the previously found risk set
  left_join(Rt_df, by = "Months") %>% 
  mutate(
    ## the the total Rt = Rt1 + Rt2
    Rt = Rt1 + Rt2,
    .after = Months
  ) %>%
  ## calculate the expected event
  mutate(
    Expected_Et1 = (Rt1 / Rt) * (Observed.Group1 + Observed.Group2),
    Expected_Et2 = (Rt2 / Rt) * (Observed.Group1 + Observed.Group2),
    Et = Expected_Et1 + Expected_Et2,
    Variance = (Rt1 * Rt2 * Et) / (Rt^2 * (Rt - 1)) * (Rt - Et),
    bt = Rt * (Observed.Group1 - Expected_Et1),
    gt = Rt^2 * Variance
  )
knitr::kable(df, caption = "Data needed for tests",digits = 2,align = "c")
Months Rt Observed.Group2 Observed.Group1 Rt1 Rt2 Expected_Et1 Expected_Et2 Et Variance bt gt
1 46 2 0 23 23 1.00 1.00 2 0.49 -46 1034.49
2 44 1 0 23 21 0.52 0.48 1 0.25 -23 483.00
7 43 0 1 23 20 0.53 0.47 1 0.25 20 460.00
8 42 1 0 22 20 0.52 0.48 1 0.25 -22 440.00
9 41 2 0 22 19 1.07 0.93 2 0.48 -44 815.10
14 39 1 1 22 17 1.13 0.87 2 0.48 -5 728.32
17 37 1 0 21 16 0.57 0.43 1 0.25 -21 336.00
20 36 1 0 21 15 0.58 0.42 1 0.24 -21 315.00
24 35 0 1 21 14 0.60 0.40 1 0.24 14 294.00
27 34 1 2 20 14 1.76 1.24 3 0.68 8 789.09
34 31 1 0 18 13 0.58 0.42 1 0.24 -18 234.00
43 30 1 0 18 12 0.60 0.40 1 0.24 -18 216.00
45 29 1 0 18 11 0.62 0.38 1 0.24 -18 198.00
47 28 1 0 18 10 0.64 0.36 1 0.23 -18 180.00
50 27 0 1 18 9 0.67 0.33 1 0.22 9 162.00
51 26 0 1 17 9 0.65 0.35 1 0.23 9 153.00
55 25 1 0 16 9 0.64 0.36 1 0.23 -16 144.00
56 24 1 1 16 8 1.33 0.67 2 0.43 -8 244.87
57 22 0 0 15 7 0.00 0.00 0 0.00 0 0.00
58 21 0 1 15 6 0.71 0.29 1 0.20 6 90.00
60 20 0 1 14 6 0.70 0.30 1 0.21 6 84.00
62 19 1 1 13 6 1.37 0.63 2 0.41 -7 147.33
64 17 1 0 12 5 0.71 0.29 1 0.21 -12 60.00
69 16 0 0 12 4 0.00 0.00 0 0.00 0 0.00
71 15 0 1 11 4 0.73 0.27 1 0.20 4 44.00
74 14 0 2 10 4 1.43 0.57 2 0.38 8 73.85
76 12 0 1 8 4 0.67 0.33 1 0.22 4 32.00
78 11 0 0 7 4 0.00 0.00 0 0.00 0 0.00
80 10 0 0 7 3 0.00 0.00 0 0.00 0 0.00
82 7 0 0 4 3 0.00 0.00 0 0.00 0 0.00
86 6 0 0 4 2 0.00 0.00 0 0.00 0 0.00
88 5 0 0 4 1 0.00 0.00 0 0.00 0 0.00
92 4 0 0 3 1 0.00 0.00 0 0.00 0 0.00
93 3 0 1 3 0 1.00 0.00 1 0.00 0 0.00
98 2 0 1 2 0 1.00 0.00 1 0.00 0 0.00
104 1 0 0 1 0 0.00 0.00 0 NaN 0 NaN

Data needed for tests

  • Get the total events and variance, we need them for test statistics.
total_df <- df %>%
  summarise(across(Observed.Group2:gt, ~ sum(.x, na.rm = TRUE)))


knitr::kable(total_df, caption = "Total events and variance",digits = 2,align = "c" )
Observed.Group2 Observed.Group1 Rt1 Rt2 Expected_Et1 Expected_Et2 Et Variance bt gt
18 17 502 320 22.35 12.65 35 7.49 -209 7758.04

Total events and variance

The Log-Rank Test

  • The test statistics is:

(tEt1E^t1)2tVt=(1722.3536)27.48837=3.827406\begin{aligned} \frac{(\sum_{t} E_{t1}-\hat{E}_{t1})^2}{\sum_{t}V_{t}} & = \frac{(17-22.3536)^{2}}{7.48837} \\ & = 3.827406 \end{aligned}

  • The degree of freedom is 1, put the result above in chi-square distribution function, we have:
1 - pchisq(3.827406, df= 1) 
## [1] 0.05042091
  • it is not less than 0.05, therefore we can not reject the null hypothesis H0:S1(t)=S2(t)H_{0}: S_{1}(t) = S_{2}(t).

Generalized Wilcoxon test

  • Given the formula:

(tbt)2tgtχ2(1)\frac{(\sum_{t} b_{t})^2}{\sum_{t}g_{t}} \sim \chi^2(1)

  • Put in the results in the table "Total events and variance":
wilcoxon_test_statistics = (total_df$bt)^2 / total_df$gt
glue::glue("The Wilcoxon test statistics is: {wilcoxon_test_statistics} \n 
           The p-value is: { 1 - pchisq(wilcoxon_test_statistics, df= 1)}")
The Wilcoxon test statistics is: 5.63041359360399 
 
The p-value is: 0.0176514662435399
  • The p-value is less than 0.05, therefore we can reject the null hypothesis, H0:S1(t)=S2(t)H_{0}: S_{1}(t) = S_{2}(t).

Comparison

  • Log-rank and Generalized Wilcoxon lead to different restult. Log-rank test is not significant whereas the other is significant.
  • The reason is as previously mentioned, the Generalized Wilcoxon is weighed with RtR_{t}. RtR_{t} is large at the beginning period and smaller in the end. Therefore if the difference of observed events and expected events is not uniform throughout the study period, then these two tests will lead to different results.

Use survival R package

One can use survdiff from the R package survival to compare two
survival functions. A general process is below:

  • Use the Surv function to create survival time object.
  • To create KM estimation: Feed the survfit with survival time object
    and group variable. One can use it for KM estimation.
  • To make non-parametric comparison, feed the survdiff with survival
    time object and group variable. The rho argument specifies which
    method to apply: 0 = the log-rank test, 1 = Peto&Peto modification of
    the Wilcoxon test.
library(survival)
## create surv time object
survtime_obj <- Surv(two_group$Months, two_group$Status)
## fit the KM estimate
KM_estimate <- survfit(survtime_obj ~ Group, data = two_group)
## plot the KM
plot(KM_estimate)
## the log rank test
logrank <- survdiff(survtime_obj ~ Group, data = two_group, rho = 0) # log rank
print(logrank)
## Call:
## survdiff(formula = survtime_obj ~ Group, data = two_group, rho = 0)
## 
##          N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=1 23       17     22.4      1.28      3.83
## Group=2 23       18     12.6      2.27      3.83
## 
##  Chisq= 3.8  on 1 degrees of freedom, p= 0.05
## the Wilcoxon test
wilcoxon<-survdiff(survtime_obj ~ Group, data = two_group,rho=1) # Wilcoxon
print(wilcoxon)
## Call:
## survdiff(formula = survtime_obj ~ Group, data = two_group, rho = 1)
## 
##          N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=1 23     8.84    13.35      1.52      5.49
## Group=2 23    13.28     8.77      2.32      5.49
## 
##  Chisq= 5.5  on 1 degrees of freedom, p= 0.02

The results above is similar to the manual calculation ones.

End thoughs

I hope you will find this post helpful in understanding the Log-Rank
Test and Generalized Wilcoxon test. Though in real life it is easy to
get the tests’ results by survival package, it is good to know the
calculations under the hood. Please note that they are non-parametric
tests which means there is no distribution assumption on the survival
time.

Prudence is a fountain of life to the prudent.