Survival Analysis 3: Non-Parametric Comparison of Survival Functions

2023-07-27 · 19 min read

This post is to share the two common non-parametric tests of comparing the survival functions: Log-Rank Test & Generalized Wilcoxon Test, as well as their corresponding calculations in the detailed process.

Introduction

Assume we have two groups of samples, one with treatment and the other without treatment. Their survival functions are $S_{1}(t)$ and $S_{2}(t)$ . We aim to test if these two groups' survival functions are different are not: $H_{0}:$ $S_{1}(t) = S_{2}(t)$ .

If there is no censoring , we can use standard non-parametric methods, e.g., Wilcoxon test. However, with presense of censoring we have to use special non-parametric tests. Two common tests are Log-Rank Test and Generalized Wilcoxon Test.

The Log-Rank Test

Let's start with this this setting:

Two groups
$R_{tj}$ = the count of samples exposed to risk at time t in the $j^{th}$ group, j = 1 or 2.
$E_{tj}$ = the count of observed events at time t in the $j^{th}$ group.
$E_{t}$ = $E_{t1} + E_{t2}$
$R_{t}$ = $R_{t1} + R_{t2}$

There are two important estimators, expected number of events and variance:

The expected events in the first group:

\hat{E}_{t1} = \frac{R_{t1}}{R_{t}}(E_{t})

At each event time t, the variance is

Var(\hat{E}_{t1}) = \frac{R_{t1}R_{t2}E_{t}}{R_{t}^2 (R_{t} - 1)}(R_{t} - E_{t})

With these two estimators, we can get the Log-Rank Test statistics as:

\frac{\sum_{t} (E_{t1}-\hat{E}_{t1})}{\sqrt{\sum_{t}V_{t}}} \sim N(0,1)

Or equivalently:

\frac{(\sum_{t} E_{t1}-\hat{E}_{t1})^2}{\sum_{t}V_{t}} \sim \chi^2(1)

Generalized Wilcoxon Test

The Generalized Wilcoxon Test is similar to the Log-Rank Test. One difference is that the Log-Rank Test is based on the differences of $E_{t1} - \hat{E}_{t1}$ and the variance $V_{t}$ . Whereas the Generalized Wilcoxon Test uses a weighted version of the differences:

A weighted difference

b_{t} = R_{t}(E_{t1}-\hat{E}_{t1})

A weighted variance

g_{t} = R_{t}^2V_{t}

Thus the test statistics becomes

\frac{(\sum_{t} b_{t})^2}{\sum_{t}g_{t}} \sim \chi^2(1)

We must note that the risk set $R_{t}$ is large at the beginning of the study period and gets smaller towards the end. As a result, $E_{t1} - \hat{E}_{t1}$ gets larger weights at the beginning and smaller weights towards the end. Therefore, Log-Rank and Generalized Wilcoxon may lead to different results when the $E_{t1} - \hat{E}_{t1}$ are not uniform.

You may wonder when to use which test. This paper gives comprehensive explanations.

In summary, when you want to compare the survival curves of two independent groups:

Use the log-rank test when you assume proportional hazards;
Use the generalized Wilcoxon test if the proportional hazard assumption is violated and you want a test of more weight at the beginning stage;
If you suspect the proportional hazards assumption is violated, but the separation pattern of the two curves is late, you can use the log-rank test. This suggestion may appear to be controversial. Please see the explanation for the table 6 in the paper mentioned above.

Exercise

The example data is from the SU survival analysis course. Before applying the survival R package, I would like to first dementrate the manual calculation.

Months	Status	Group	Group1	Group2
7	1	1	1	0
14	1	1	1	0
24	1	1	1	0
27	1	1	1	0
27	1	1	1	0
50	1	1	1	0
51	1	1	1	0
56	1	1	1	0
58	1	1	1	0
60	1	1	1	0
62	1	1	1	0
69	0	1	1	0
71	1	1	1	0
74	1	1	1	0
74	1	1	1	0
76	1	1	1	0
80	0	1	1	0
80	0	1	1	0
80	0	1	1	0
88	0	1	1	0
93	1	1	1	0
98	1	1	1	0
104	0	1	1	0
1	1	2	0	1
1	1	2	0	1
2	1	2	0	1
8	1	2	0	1
9	1	2	0	1
9	1	2	0	1
14	1	2	0	1
17	1	2	0	1
20	1	2	0	1
27	1	2	0	1
34	1	2	0	1
43	1	2	0	1
45	1	2	0	1
47	1	2	0	1
55	1	2	0	1
56	1	2	0	1
57	0	2	0	1
62	1	2	0	1
64	1	2	0	1
78	0	2	0	1
82	0	2	0	1
86	0	2	0	1
92	0	2	0	1

Example data

The status = 1 means not censored, status = 0 means being censored.
Months indicates the survival time.

Since the original data does not gave the count number of $R_{tj}$ , we have to calculate it first.

library(tidyverse)

## survival time
event_time <- c()
## risk sets of group 1 and group 2
Rt1 <- c()
Rt2 <- c()
## i records survival time when one experiences event of interest
i <- 1
## j records the length of survival time vector
j <- 1
## the maximum of i is the largest survival time,
while (i <= max(two_group$Months)) {
  ## skip time = i if no occurrence of event
  if (!i %in% two_group$Months) {
    i <- i + 1
  } else {
    event_time[j] <- i
    ## count the risk set in group 1
    Rt1[j] <- two_group %>%
      filter(Group == 1) %>%
      filter(Months >= event_time[j]) %>%
      nrow()
    ## count the risk set in group 2
    Rt2[j] <- two_group %>%
      filter(Group == 2) %>%
      filter(Months >= event_time[j]) %>%
      nrow()
    j <- j + 1
    i <- i + 1
  }
}

Rt_df <- tibble::tibble(Months = event_time, Rt1, Rt2)
knitr::kable(Rt_df, caption = "Risk sets at each survival time",align = "c")

Months	Rt1	Rt2
1	23	23
2	23	21
7	23	20
8	22	20
9	22	19
14	22	17
17	21	16
20	21	15
24	21	14
27	20	14
34	18	13
43	18	12
45	18	11
47	18	10
50	18	9
51	17	9
55	16	9
56	16	8
57	15	7
58	15	6
60	14	6
62	13	6
64	12	5
69	12	4
71	11	4
74	10	4
76	8	4
78	7	4
80	7	3
82	4	3
86	4	2
88	4	1
92	3	1
93	3	0
98	2	0
104	1	0

Risk sets at each survival time

Then we have to count the occurrences of event $E_{tj}$ .

## Find the observed events Et
Et_df <- two_group %>%
  select(Months, Status, Group) %>%
  arrange(Months) %>%
  group_by(Months, Group) %>%
  ## get the count
  summarise(n = sum(Status), .groups = "drop") %>%
  ## make the table in wide format
  pivot_wider(
    names_from = Group,
    names_glue = "Observed.Group{Group}",
    values_from = n,
    values_fill = 0
  )
knitr::kable(Et_df, caption = "Expected event at each survival time",align = "c")

Months	Observed.Group2	Observed.Group1
1	2	0
2	1	0
7	0	1
8	1	0
9	2	0
14	1	1
17	1	0
20	1	0
24	0	1
27	1	2
34	1	0
43	1	0
45	1	0
47	1	0
50	0	1
51	0	1
55	1	0
56	1	1
57	0	0
58	0	1
60	0	1
62	1	1
64	1	0
69	0	0
71	0	1
74	0	2
76	0	1
78	0	0
80	0	0
82	0	0
86	0	0
88	0	0
92	0	0
93	0	1
98	0	1
104	0	0

Expected event at each survival time

Join the two tables together, find the $\hat{E}_{t1}$ and $\hat{E}_{t2}$ , variance, $g_{t}$ and $b_{t}$ .

df = Et_df %>%
  ## join with the previously found risk set
  left_join(Rt_df, by = "Months") %>% 
  mutate(
    ## the the total Rt = Rt1 + Rt2
    Rt = Rt1 + Rt2,
    .after = Months
  ) %>%
  ## calculate the expected event
  mutate(
    Expected_Et1 = (Rt1 / Rt) * (Observed.Group1 + Observed.Group2),
    Expected_Et2 = (Rt2 / Rt) * (Observed.Group1 + Observed.Group2),
    Et = Expected_Et1 + Expected_Et2,
    Variance = (Rt1 * Rt2 * Et) / (Rt^2 * (Rt - 1)) * (Rt - Et),
    bt = Rt * (Observed.Group1 - Expected_Et1),
    gt = Rt^2 * Variance
  )
knitr::kable(df, caption = "Data needed for tests",digits = 2,align = "c")

Months	Rt	Observed.Group2	Observed.Group1	Rt1	Rt2	Expected_Et1	Expected_Et2	Et	Variance	bt	gt
1	46	2	0	23	23	1.00	1.00	2	0.49	-46	1034.49
2	44	1	0	23	21	0.52	0.48	1	0.25	-23	483.00
7	43	0	1	23	20	0.53	0.47	1	0.25	20	460.00
8	42	1	0	22	20	0.52	0.48	1	0.25	-22	440.00
9	41	2	0	22	19	1.07	0.93	2	0.48	-44	815.10
14	39	1	1	22	17	1.13	0.87	2	0.48	-5	728.32
17	37	1	0	21	16	0.57	0.43	1	0.25	-21	336.00
20	36	1	0	21	15	0.58	0.42	1	0.24	-21	315.00
24	35	0	1	21	14	0.60	0.40	1	0.24	14	294.00
27	34	1	2	20	14	1.76	1.24	3	0.68	8	789.09
34	31	1	0	18	13	0.58	0.42	1	0.24	-18	234.00
43	30	1	0	18	12	0.60	0.40	1	0.24	-18	216.00
45	29	1	0	18	11	0.62	0.38	1	0.24	-18	198.00
47	28	1	0	18	10	0.64	0.36	1	0.23	-18	180.00
50	27	0	1	18	9	0.67	0.33	1	0.22	9	162.00
51	26	0	1	17	9	0.65	0.35	1	0.23	9	153.00
55	25	1	0	16	9	0.64	0.36	1	0.23	-16	144.00
56	24	1	1	16	8	1.33	0.67	2	0.43	-8	244.87
57	22	0	0	15	7	0.00	0.00	0	0.00	0	0.00
58	21	0	1	15	6	0.71	0.29	1	0.20	6	90.00
60	20	0	1	14	6	0.70	0.30	1	0.21	6	84.00
62	19	1	1	13	6	1.37	0.63	2	0.41	-7	147.33
64	17	1	0	12	5	0.71	0.29	1	0.21	-12	60.00
69	16	0	0	12	4	0.00	0.00	0	0.00	0	0.00
71	15	0	1	11	4	0.73	0.27	1	0.20	4	44.00
74	14	0	2	10	4	1.43	0.57	2	0.38	8	73.85
76	12	0	1	8	4	0.67	0.33	1	0.22	4	32.00
78	11	0	0	7	4	0.00	0.00	0	0.00	0	0.00
80	10	0	0	7	3	0.00	0.00	0	0.00	0	0.00
82	7	0	0	4	3	0.00	0.00	0	0.00	0	0.00
86	6	0	0	4	2	0.00	0.00	0	0.00	0	0.00
88	5	0	0	4	1	0.00	0.00	0	0.00	0	0.00
92	4	0	0	3	1	0.00	0.00	0	0.00	0	0.00
93	3	0	1	3	0	1.00	0.00	1	0.00	0	0.00
98	2	0	1	2	0	1.00	0.00	1	0.00	0	0.00
104	1	0	0	1	0	0.00	0.00	0	NaN	0	NaN

Data needed for tests

Get the total events and variance, we need them for test statistics.

total_df <- df %>%
  summarise(across(Observed.Group2:gt, ~ sum(.x, na.rm = TRUE)))


knitr::kable(total_df, caption = "Total events and variance",digits = 2,align = "c" )

Observed.Group2	Observed.Group1	Rt1	Rt2	Expected_Et1	Expected_Et2	Et	Variance	bt	gt
18	17	502	320	22.35	12.65	35	7.49	-209	7758.04

Total events and variance

The Log-Rank Test

The test statistics is:

\begin{aligned} \frac{(\sum_{t} E_{t1}-\hat{E}_{t1})^2}{\sum_{t}V_{t}} & = \frac{(17-22.3536)^{2}}{7.48837} \\ & = 3.827406 \end{aligned}

The degree of freedom is 1, put the result above in chi-square distribution function, we have:

1 - pchisq(3.827406, df= 1)

## [1] 0.05042091

it is not less than 0.05, therefore we can not reject the null hypothesis $H_{0}: S_{1}(t) = S_{2}(t)$ .

Generalized Wilcoxon test

Given the formula:

\frac{(\sum_{t} b_{t})^2}{\sum_{t}g_{t}} \sim \chi^2(1)

Put in the results in the table "Total events and variance":

wilcoxon_test_statistics = (total_df$bt)^2 / total_df$gt
glue::glue("The Wilcoxon test statistics is: {wilcoxon_test_statistics} \n 
           The p-value is: { 1 - pchisq(wilcoxon_test_statistics, df= 1)}")

The Wilcoxon test statistics is: 5.63041359360399 
 
The p-value is: 0.0176514662435399

The p-value is less than 0.05, therefore we can reject the null hypothesis, $H_{0}: S_{1}(t) = S_{2}(t)$ .

Comparison

Log-rank and Generalized Wilcoxon lead to different restult. Log-rank test is not significant whereas the other is significant.
The reason is as previously mentioned, the Generalized Wilcoxon is weighed with $R_{t}$ . $R_{t}$ is large at the beginning period and smaller in the end. Therefore if the difference of observed events and expected events is not uniform throughout the study period, then these two tests will lead to different results.

Use `survival` R package

One can use survdiff from the R package survival to compare two
survival functions. A general process is below:

Use the Surv function to create survival time object.
To create KM estimation: Feed the survfit with survival time object
and group variable. One can use it for KM estimation.
To make non-parametric comparison, feed the survdiff with survival
time object and group variable. The rho argument specifies which
method to apply: 0 = the log-rank test, 1 = Peto&Peto modification of
the Wilcoxon test.

library(survival)
## create surv time object
survtime_obj <- Surv(two_group$Months, two_group$Status)
## fit the KM estimate
KM_estimate <- survfit(survtime_obj ~ Group, data = two_group)
## plot the KM
plot(KM_estimate)

## the log rank test
logrank <- survdiff(survtime_obj ~ Group, data = two_group, rho = 0) # log rank
print(logrank)

## Call:
## survdiff(formula = survtime_obj ~ Group, data = two_group, rho = 0)
## 
##          N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=1 23       17     22.4      1.28      3.83
## Group=2 23       18     12.6      2.27      3.83
## 
##  Chisq= 3.8  on 1 degrees of freedom, p= 0.05

## the Wilcoxon test
wilcoxon<-survdiff(survtime_obj ~ Group, data = two_group,rho=1) # Wilcoxon
print(wilcoxon)

## Call:
## survdiff(formula = survtime_obj ~ Group, data = two_group, rho = 1)
## 
##          N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=1 23     8.84    13.35      1.52      5.49
## Group=2 23    13.28     8.77      2.32      5.49
## 
##  Chisq= 5.5  on 1 degrees of freedom, p= 0.02

The results above is similar to the manual calculation ones.

End thoughs

I hope you will find this post helpful in understanding the Log-Rank
Test and Generalized Wilcoxon test. Though in real life it is easy to
get the tests’ results by survival package, it is good to know the
calculations under the hood. Please note that they are non-parametric
tests which means there is no distribution assumption on the survival
time.

Introduction
The Log-Rank Test
Generalized Wilcoxon Test
Exercise
End thoughs

Months	Status	Group	Group1	Group2
7	1	1	1	0
14	1	1	1	0
24	1	1	1	0
27	1	1	1	0
27	1	1	1	0
50	1	1	1	0
51	1	1	1	0
56	1	1	1	0
58	1	1	1	0
60	1	1	1	0
62	1	1	1	0
69	0	1	1	0
71	1	1	1	0
74	1	1	1	0
74	1	1	1	0
76	1	1	1	0
80	0	1	1	0
80	0	1	1	0
80	0	1	1	0
88	0	1	1	0
93	1	1	1	0
98	1	1	1	0
104	0	1	1	0
1	1	2	0	1
1	1	2	0	1
2	1	2	0	1
8	1	2	0	1
9	1	2	0	1
9	1	2	0	1
14	1	2	0	1
17	1	2	0	1
20	1	2	0	1
27	1	2	0	1
34	1	2	0	1
43	1	2	0	1
45	1	2	0	1
47	1	2	0	1
55	1	2	0	1
56	1	2	0	1
57	0	2	0	1
62	1	2	0	1
64	1	2	0	1
78	0	2	0	1
82	0	2	0	1
86	0	2	0	1
92	0	2	0	1

Months	Rt1	Rt2
1	23	23
2	23	21
7	23	20
8	22	20
9	22	19
14	22	17
17	21	16
20	21	15
24	21	14
27	20	14
34	18	13
43	18	12
45	18	11
47	18	10
50	18	9
51	17	9
55	16	9
56	16	8
57	15	7
58	15	6
60	14	6
62	13	6
64	12	5
69	12	4
71	11	4
74	10	4
76	8	4
78	7	4
80	7	3
82	4	3
86	4	2
88	4	1
92	3	1
93	3	0
98	2	0
104	1	0

Months	Observed.Group2	Observed.Group1
1	2	0
2	1	0
7	0	1
8	1	0
9	2	0
14	1	1
17	1	0
20	1	0
24	0	1
27	1	2
34	1	0
43	1	0
45	1	0
47	1	0
50	0	1
51	0	1
55	1	0
56	1	1
57	0	0
58	0	1
60	0	1
62	1	1
64	1	0
69	0	0
71	0	1
74	0	2
76	0	1
78	0	0
80	0	0
82	0	0
86	0	0
88	0	0
92	0	0
93	0	1
98	0	1
104	0	0

Months	Status	Group	Group1	Group2
7	1	1	1	0
14	1	1	1	0
24	1	1	1	0
27	1	1	1	0
27	1	1	1	0
50	1	1	1	0
51	1	1	1	0
56	1	1	1	0
58	1	1	1	0
60	1	1	1	0
62	1	1	1	0
69	0	1	1	0
71	1	1	1	0
74	1	1	1	0
74	1	1	1	0
76	1	1	1	0
80	0	1	1	0
80	0	1	1	0
80	0	1	1	0
88	0	1	1	0
93	1	1	1	0
98	1	1	1	0
104	0	1	1	0
1	1	2	0	1
1	1	2	0	1
2	1	2	0	1
8	1	2	0	1
9	1	2	0	1
9	1	2	0	1
14	1	2	0	1
17	1	2	0	1
20	1	2	0	1
27	1	2	0	1
34	1	2	0	1
43	1	2	0	1
45	1	2	0	1
47	1	2	0	1
55	1	2	0	1
56	1	2	0	1
57	0	2	0	1
62	1	2	0	1
64	1	2	0	1
78	0	2	0	1
82	0	2	0	1
86	0	2	0	1
92	0	2	0	1

Months	Rt1	Rt2
1	23	23
2	23	21
7	23	20
8	22	20
9	22	19
14	22	17
17	21	16
20	21	15
24	21	14
27	20	14
34	18	13
43	18	12
45	18	11
47	18	10
50	18	9
51	17	9
55	16	9
56	16	8
57	15	7
58	15	6
60	14	6
62	13	6
64	12	5
69	12	4
71	11	4
74	10	4
76	8	4
78	7	4
80	7	3
82	4	3
86	4	2
88	4	1
92	3	1
93	3	0
98	2	0
104	1	0

Months	Observed.Group2	Observed.Group1
1	2	0
2	1	0
7	0	1
8	1	0
9	2	0
14	1	1
17	1	0
20	1	0
24	0	1
27	1	2
34	1	0
43	1	0
45	1	0
47	1	0
50	0	1
51	0	1
55	1	0
56	1	1
57	0	0
58	0	1
60	0	1
62	1	1
64	1	0
69	0	0
71	0	1
74	0	2
76	0	1
78	0	0
80	0	0
82	0	0
86	0	0
88	0	0
92	0	0
93	0	1
98	0	1
104	0	0

Survival Analysis 3: Non-Parametric Comparison of Survival Functions

Introduction

The Log-Rank Test

Generalized Wilcoxon Test

Exercise

The Log-Rank Test

Generalized Wilcoxon test

Comparison

Use survival R package

End thoughs

Use `survival` R package

Months	Status	Group	Group1	Group2
7	1	1	1	0
14	1	1	1	0
24	1	1	1	0
27	1	1	1	0
27	1	1	1	0
50	1	1	1	0
51	1	1	1	0
56	1	1	1	0
58	1	1	1	0
60	1	1	1	0
62	1	1	1	0
69	0	1	1	0
71	1	1	1	0
74	1	1	1	0
74	1	1	1	0
76	1	1	1	0
80	0	1	1	0
80	0	1	1	0
80	0	1	1	0
88	0	1	1	0
93	1	1	1	0
98	1	1	1	0
104	0	1	1	0
1	1	2	0	1
1	1	2	0	1
2	1	2	0	1
8	1	2	0	1
9	1	2	0	1
9	1	2	0	1
14	1	2	0	1
17	1	2	0	1
20	1	2	0	1
27	1	2	0	1
34	1	2	0	1
43	1	2	0	1
45	1	2	0	1
47	1	2	0	1
55	1	2	0	1
56	1	2	0	1
57	0	2	0	1
62	1	2	0	1
64	1	2	0	1
78	0	2	0	1
82	0	2	0	1
86	0	2	0	1
92	0	2	0	1

Months	Rt1	Rt2
1	23	23
2	23	21
7	23	20
8	22	20
9	22	19
14	22	17
17	21	16
20	21	15
24	21	14
27	20	14
34	18	13
43	18	12
45	18	11
47	18	10
50	18	9
51	17	9
55	16	9
56	16	8
57	15	7
58	15	6
60	14	6
62	13	6
64	12	5
69	12	4
71	11	4
74	10	4
76	8	4
78	7	4
80	7	3
82	4	3
86	4	2
88	4	1
92	3	1
93	3	0
98	2	0
104	1	0

Months	Observed.Group2	Observed.Group1
1	2	0
2	1	0
7	0	1
8	1	0
9	2	0
14	1	1
17	1	0
20	1	0
24	0	1
27	1	2
34	1	0
43	1	0
45	1	0
47	1	0
50	0	1
51	0	1
55	1	0
56	1	1
57	0	0
58	0	1
60	0	1
62	1	1
64	1	0
69	0	0
71	0	1
74	0	2
76	0	1
78	0	0
80	0	0
82	0	0
86	0	0
88	0	0
92	0	0
93	0	1
98	0	1
104	0	0