Causal Inference 3: Inverse probability of treatment weighting, IPTW

2021-05-28 · 10 min read

In this post we will continue on discussing the estimate of causal effects. We will talk about intuition of IPTW, some key definitions like weighting, marginal structual models. And in the end we will show a data example in R.

Intuition for inverse probability of treatment weighting (IPTW)

Motivating example

Suppose there is a single binary confounder X.
Among people with X = 1, only 10% receive the treatment. P(A = 1| X = 1) = 0.1. Among people with x = 0, 80% receive the treatment. P(A = 1| X = 0) = 0.8.

Let us suppose there are 10 observations for X = 1, with 1 in the treated group and 9 in the control group. If we use the propensity score matching, we would match one treated subject with one randomly selcted subject in the control group. The one from the treated group counts as the same as 9 subjects from the control group. Rather than matching, we could use all of the data, but down-wight some and up-weight others. The particular treated subject should end up having nine times more weigtht than any of these individuals from the control group.

Weighting

This is accomplished by weighting by the inverse of the probability of treatment received. This is known as inverse probability of treatment weighting (IPTW).

For treated subjects, weight by the inverse of P(A=1|X).
For control subjects, weight by the inverse of P(A=0|X).

The main difference between propensity score matching and IPTW is that propensity matching is one-on-one whereas IPTW end up accomplishing the same thing where for a given value of X, or propensity score, we count the collection of treated subjects in the same way as a collection of control subjects, as if it was a randomized trial.

Marginal structural models

A marginal structural model (MSE) is a model for the mean of the potantial outcomes. "Marginal" indicates that it is not conditional on the confounders. "Structural" means that it is a model for potential outcomes, not observed outcomes.

Linear MSM

It is defined as follows

E(Y^{a}) = \psi_{0} + \psi_{1} a, \; a = 0,1

$E(Y^{0}) = \psi_{0}$ , $E(Y^{1}) = \psi_{0} + \psi_{1}$ , $\psi_{1}$ is the average causal effect $E(Y^{1}) - E(Y^{0})$ .

You would typically use a model like this if you had a continuous outcome.

Logistic MSM

logit(E(Y^{a}) )= \psi_{0} + \psi_{1} a, \; a = 0,1

The $exp(\psi_{1})$ is the causal odds ratio: $\frac{P(Y^{1} = 1) / (1 - P(Y^{1} = 1))}{P(Y^{0} =1) / (1 - P(Y^{0} = 1))}$ .

Estimation in MSMs

Steps:

Estimate propensity score
Create weights
- 1 divided by propensity score for treated subjects.
- 1 divided by 1 minus the propensity score for control subjects.
Specify the MSM of interest (e.g. continuous or binary outcome)
Use software to fit a weighted generalized linear model
Use asymptotic (sandwich) variance estimator or bootstrapping

Assessing balance

Balance after weighting

Covariate balance can be checked on the weighted sample using standardized differences by comparing in a table or in a plot. One metric is standardized mean difference (SMD).

Distribution of weights

Why are we interested in the distribution of weights? Larger weights lead to noisier estimates of causal effects. If one person's outcome data can greatly affect the parameter estimate, then the standard error will be large. One way to estimate standard errors is bootstrapping:

Randomly sample with replacement from the original sample.
Estimate parameters.
Repeat steps 1 and 2 many times.
The standard deviation of the bootstrap estimates is an estimate of the standard error.

However, someone with a very large weight will be included in some bootstrap samples, but not others. whether or not they are included will have a relatively large impact on the parameter estimates. Thus, a lot of the variability of the estimator would be due to this one subject. Large weights on some individuals can lead to really noisy estimator of causal effect.

You can also use summary statistics of weights for examination.

Remedies for large weights

A good first step is to look into why the weights are large.
Identify the subjects who have large weights.

What is unusual about them?
Is there a problem with their data?
Is there a problem with the propensity score model?

Trimming the tails

Large weights occur at observations in the tails of the propensity score distribution. Trimming the tails can eliminate some of the extreme weights. A common trimming strategy:

Remove treated subjects whose propensity scores are above the 98th percentile from the distribution from the distribution among controls.
Remove control subjects whose propensity scores are below the 2nd perventile from the distribution of treated subjects.

Another approach is weight truncation. Steps:

Determine a maximum allowable weight. It could be a specific value or a percentile.
If a weight is greater than the maximum allowable, set it to the maximum allowable value. Whether or not to truncate weights involves a bias-varianfce trade-off: Truncation: bias, but smaller variance. No truncation: unbiased, larger variance.

Data example in R

For this example we will use data from Lalonde (1986), that aimed to
evaluate the impact of National Supported Work (NSW) Demonstration,
which is a labor training program, on post-intervention income levels.
Interest is in estimating the causal effect of this training program on
income.

Loard the data and package

The data are the same as described in the post of part 2

# install.packages("tableone")
# install.packages("Matching")
# install.packages("ipw")
# install.packages("survey")
# install.packages("MatchIt")

library(tableone)
library(Matching)
library(ipw) 
library(survey)
library(tidyverse)
library(MatchIt) # the data is in this package
data(lalonde)
str(lalonde) # check the type of data

## 'data.frame':    614 obs. of  9 variables:
##  $ treat   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ age     : int  37 22 30 27 33 22 23 32 22 33 ...
##  $ educ    : int  11 9 12 11 8 9 12 11 16 12 ...
##  $ race    : Factor w/ 3 levels "black","hispan",..: 1 2 1 1 1 1 1 1 1 3 ...
##  $ married : int  1 0 0 0 0 0 0 0 0 1 ...
##  $ nodegree: int  1 1 0 1 1 1 0 1 0 0 ...
##  $ re74    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ re75    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ re78    : num  9930 3596 24909 7506 290 ...

# select the confounder
xvars = colnames(lalonde %>% select(-c("treat","re78")))

1. Find the minimum and mnaximum weights.

# propensity score model
psmodel = glm(treat~.-re78, family = binomial(link = "logit"),data = lalonde)

## Value of propensity score for each subject
ps = predict(psmodel, type = "response")

# creat weights
weight = ifelse(lalonde$treat ==1, 1/ps, 1/ (1-ps))

cat("the maximum weight is", max(weight), "\n",
    "the minimum weight is", min(weight))

## the maximum weight is 40.07729 
##  the minimum weight is 1.009163

2. Find the standardized differences for each confounder on the weighted (pseudo) population. What is the standardized difference for nodegree?

# apply weights to data
weighteddata = svydesign(ids = ~1 , data = lalonde, weights = ~ weight)

# weighted table 1
weightedtable = svyCreateTableOne(vars = xvars, strata = "treat", data = weighteddata, test = FALSE)

# show the table with SMD
print(weightedtable, smd = TRUE)

##                       Stratified by treat
##                        0                 1                 SMD   
##   n                     616.00            553.63                 
##   age (mean (SD))        27.10 (10.80)     25.57 (6.53)     0.172
##   educ (mean (SD))       10.29 (2.74)      10.61 (2.05)     0.132
##   race (%)                                                  0.112
##      black               245.1 (39.8)      247.9 (44.8)          
##      hispan               72.1 (11.7)       67.4 (12.2)          
##      white               298.8 (48.5)      238.3 (43.0)          
##   married (mean (SD))     0.41 (0.49)       0.31 (0.47)     0.197
##   nodegree (mean (SD))    0.62 (0.48)       0.57 (0.50)     0.112
##   re74 (mean (SD))     4552.74 (6337.09) 2932.18 (5709.42)  0.269
##   re75 (mean (SD))     2172.04 (3160.14) 1658.07 (3072.89)  0.165

From the table above, we can see that the standardized difference for
nodegree is 0.112. This is a bit large since we expect it to be smaller
than 0.01.

3. Using IPTW, find the estimate and 95% confidence interval for the average causal effect. This can be obtained from svyglm.

# fit a marginal structural model 
msm = svyglm(re78~treat, design = svydesign(~1, weights = ~weight, data = lalonde))

# find the confidence interval
confint(msm)

##                 2.5 %   97.5 %
## (Intercept)  5706.948 7138.730
## treat       -1559.321 2008.673

# find the estimated coefficients
coef(msm)

## (Intercept)       treat 
##   6422.8390    224.6763

From the preliminary analysis, we can say that the training can bring about 224 USD increase in salary.

Intuition for inverse probability of treatment weighting (IPTW)
- Motivating example
- Weighting
Marginal structural models
Assessing balance
Data example in R