Bolin Wu
#
Causal Inference 3: Inverse probability of treatment weighting, IPTW

# Intuition for inverse probability of treatment weighting (IPTW)

## Motivating example

## Weighting

# Marginal structural models

## Linear MSM

## Logistic MSM

## Estimation in MSMs

# Assessing balance

## Balance after weighting

## Distribution of weights

## Remedies for large weights

### Trimming the tails

# Data example in R

## Loard the data and package

## 1. Find the minimum and mnaximum weights.

## 2. Find the standardized differences for each confounder on the weighted (pseudo) population. What is the standardized difference for nodegree?

## 3. Using IPTW, find the estimate and 95% confidence interval for the average causal effect. This can be obtained from svyglm.

In this post we will continue on discussing the estimate of causal effects. We will talk about intuition of IPTW, some key definitions like weighting, marginal structual models. And in the end we will show a data example in R.

Suppose there is a single binary confounder X.

Among people with X = 1, only 10% receive the treatment. P(A = 1| X = 1) = 0.1. Among people with x = 0, 80% receive the treatment. P(A = 1| X = 0) = 0.8.

Let us suppose there are 10 observations for X = 1, with 1 in the treated group and 9 in the control group. If we use the propensity score matching, we would match one treated subject with one randomly selcted subject in the control group. The one from the treated group counts as the same as 9 subjects from the control group. Rather than matching, we could use all of the data, but down-wight some and up-weight others. The particular treated subject should end up having nine times more weigtht than any of these individuals from the control group.

This is accomplished by weighting by the inverse of the probability of treatment received. This is known as inverse probability of treatment weighting (IPTW).

For treated subjects, weight by the inverse of P(A=1|X).

For control subjects, weight by the inverse of P(A=0|X).

The main difference between propensity score matching and IPTW is that propensity matching is one-on-one whereas IPTW end up accomplishing the same thing where for a given value of X, or propensity score, we count the collection of treated subjects in the same way as a collection of control subjects, as if it was a randomized trial.

A marginal structural model (MSE) is a model for the mean of the potantial outcomes. "**Marginal**" indicates that it is not conditional on the confounders. "**Structural**" means that it is a model for potential outcomes, not observed outcomes.

It is defined as follows

$E(Y^{a}) = \psi_{0} + \psi_{1} a, \; a = 0,1$

$E(Y^{0}) = \psi_{0}$, $E(Y^{1}) = \psi_{0} + \psi_{1}$, $\psi_{1}$ is the **average causal effect** $E(Y^{1}) - E(Y^{0})$.

You would typically use a model like this if you had a continuous outcome.

$logit(E(Y^{a}) )= \psi_{0} + \psi_{1} a, \; a = 0,1$

The $exp(\psi_{1})$ is the **causal odds ratio**: $\frac{P(Y^{1} = 1) / (1 - P(Y^{1} = 1))}{P(Y^{0} =1) / (1 - P(Y^{0} = 1))}$.

Steps:

- Estimate propensity score
- Create weights
- 1 divided by propensity score for treated subjects.
- 1 divided by 1 minus the propensity score for control subjects.

- Specify the MSM of interest (e.g. continuous or binary outcome)
- Use software to fit a weighted generalized linear model
- Use asymptotic (sandwich) variance estimator or bootstrapping

Covariate balance can be checked on the weighted sample using standardized differences by comparing in a table or in a plot. One metric is standardized mean difference (SMD).

Why are we interested in the distribution of weights? Larger weights lead to noisier estimates of causal effects. If one person's outcome data can greatly affect the parameter estimate, then the standard error will be large. One way to estimate standard errors is **bootstrapping**:

- Randomly sample with replacement from the original sample.
- Estimate parameters.
- Repeat steps 1 and 2 many times.
- The standard deviation of the bootstrap estimates is an estimate of the standard error.

However, someone with a very large weight will be included in some bootstrap samples, but not others. whether or not they are included will have a relatively large impact on the parameter estimates. Thus, a lot of the variability of the estimator would be due to this one subject. Large weights on some individuals can lead to really noisy estimator of causal effect.

You can also use summary statistics of weights for examination.

A good first step is to look into **why** the weights are large.

Identify the subjects who have large weights.

- What is
**unusual**about them? - Is there a problem with their data?
- Is there a problem with the propensity score model?

Large weights occur at observations in the tails of the propensity score distribution. Trimming the tails can eliminate some of the extreme weights. A common trimming strategy:

- Remove treated subjects whose propensity scores are above the 98th percentile from the distribution from the distribution among controls.
- Remove control subjects whose propensity scores are below the 2nd perventile from the distribution of treated subjects.

Another approach is weight truncation. Steps:

- Determine a maximum allowable weight. It could be a specific value or a percentile.
- If a weight is greater than the maximum allowable, set it to the maximum allowable value. Whether or not to truncate weights involves a
**bias-varianfce trade-off**: Truncation: bias, but smaller variance. No truncation: unbiased, larger variance.

For this example we will use data from Lalonde (1986), that aimed to

evaluate the impact of National Supported Work (NSW) Demonstration,

which is a labor training program, on post-intervention income levels.

Interest is in estimating the causal effect of this training program on

income.

The data are the same as described in the post of part 2

```
# install.packages("tableone")
# install.packages("Matching")
# install.packages("ipw")
# install.packages("survey")
# install.packages("MatchIt")
library(tableone)
library(Matching)
library(ipw)
library(survey)
library(tidyverse)
library(MatchIt) # the data is in this package
data(lalonde)
str(lalonde) # check the type of data
```

```
## 'data.frame': 614 obs. of 9 variables:
## $ treat : int 1 1 1 1 1 1 1 1 1 1 ...
## $ age : int 37 22 30 27 33 22 23 32 22 33 ...
## $ educ : int 11 9 12 11 8 9 12 11 16 12 ...
## $ race : Factor w/ 3 levels "black","hispan",..: 1 2 1 1 1 1 1 1 1 3 ...
## $ married : int 1 0 0 0 0 0 0 0 0 1 ...
## $ nodegree: int 1 1 0 1 1 1 0 1 0 0 ...
## $ re74 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ re75 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ re78 : num 9930 3596 24909 7506 290 ...
```

```
# select the confounder
xvars = colnames(lalonde %>% select(-c("treat","re78")))
```

```
# propensity score model
psmodel = glm(treat~.-re78, family = binomial(link = "logit"),data = lalonde)
## Value of propensity score for each subject
ps = predict(psmodel, type = "response")
# creat weights
weight = ifelse(lalonde$treat ==1, 1/ps, 1/ (1-ps))
cat("the maximum weight is", max(weight), "\n",
"the minimum weight is", min(weight))
```

```
## the maximum weight is 40.07729
## the minimum weight is 1.009163
```

```
# apply weights to data
weighteddata = svydesign(ids = ~1 , data = lalonde, weights = ~ weight)
# weighted table 1
weightedtable = svyCreateTableOne(vars = xvars, strata = "treat", data = weighteddata, test = FALSE)
# show the table with SMD
print(weightedtable, smd = TRUE)
```

```
## Stratified by treat
## 0 1 SMD
## n 616.00 553.63
## age (mean (SD)) 27.10 (10.80) 25.57 (6.53) 0.172
## educ (mean (SD)) 10.29 (2.74) 10.61 (2.05) 0.132
## race (%) 0.112
## black 245.1 (39.8) 247.9 (44.8)
## hispan 72.1 (11.7) 67.4 (12.2)
## white 298.8 (48.5) 238.3 (43.0)
## married (mean (SD)) 0.41 (0.49) 0.31 (0.47) 0.197
## nodegree (mean (SD)) 0.62 (0.48) 0.57 (0.50) 0.112
## re74 (mean (SD)) 4552.74 (6337.09) 2932.18 (5709.42) 0.269
## re75 (mean (SD)) 2172.04 (3160.14) 1658.07 (3072.89) 0.165
```

From the table above, we can see that the standardized difference for

nodegree is 0.112. This is a bit large since we expect it to be smaller

than 0.01.

```
# fit a marginal structural model
msm = svyglm(re78~treat, design = svydesign(~1, weights = ~weight, data = lalonde))
# find the confidence interval
confint(msm)
```

```
## 2.5 % 97.5 %
## (Intercept) 5706.948 7138.730
## treat -1559.321 2008.673
```

```
# find the estimated coefficients
coef(msm)
```

```
## (Intercept) treat
## 6422.8390 224.6763
```

From the preliminary analysis, we can say that the training can bring about 224 USD increase in salary.

- Intuition for inverse probability of treatment weighting (IPTW)
- Marginal structural models
- Assessing balance
- Data example in R
- Loard the data and package
- 1. Find the minimum and mnaximum weights.
- 2. Find the standardized differences for each confounder on the weighted (pseudo) population. What is the standardized difference for nodegree?
- 3. Using IPTW, find the estimate and 95% confidence interval for the average causal effect. This can be obtained from svyglm.