Bolin Wu
#
Causal Inference 4: Instrumental variable

# Introduction to instrumental variables

## Encouragement design

# Randomized trials with noncompliance

# Compliance classes

## Local average treatment effect

## Identifiability

# Assumptions

## Assumptions about IVs

## Monotonicity assumption

# Causal effect identifcation and estimation

# Two stage least squares

# Data example in R

## Study motivation

# Summary

Intromental variables (IV) is an alternative causal inference method that does not rely on the ignorability assumption.

Let us think of an example. A: smoking during pregnancy (yes or no); Y: birthweight; X: party, mother's age, weight, etc.

**Concern**: There could be unmeasured confounders.

**Challenge**: Not ethical to randomly assignsmoking to pregnant women.

So there could be unmeasured confounders like mother's age, whether she's given birth before or her weight, etc. What we could do is using the **encouragement** design.

Adding Z: randomized to either receive **encouragement** to stop smoking (Z=1) or receive usual care (Z = 0). An intention-to-treat analysis would focus on the **causal effect of encouragement**:

$E(Y^{z=1}) - E(Y^{z=0})$

This is a valid causal effect and would likely be of some interest.

What can we say about the **causal effect of smoking** itself? This is the focus of IV mothods.

We can begin by imagining a randomized trials:

- Z: randomized to treatment (1 if randomized to treatment, otherwise 0)
- A: treatment received (1 if receive treatment, otherwise 0)
- Y: outcome
- Please note that not everyone assigned treatment will actually receive the treatment (non-compliance).

Essentially the non-compliance makes a randomized trial like an observational study. There could be confounding based on **treatment received**. It might be reasonable to assume that treatment assignment does not directly affect Y. Here Z can be thought of as (strong) **encouragement** to receive the treatment.

We can classify people based on potential treatment.

$A^0$ | $A^1$ | Label |
---|---|---|

0 | 0 | Never-takers |

0 | 1 | Compliers |

1 | 0 | Defiers |

1 | 1 | Always-takers |

The number in the table represents if a person actually take the treatment or not. Take the first row as anexample, when he is not assignted to receive to treatment, he does not receive the treatment (0). However, when he is assigned, he still does not take the treatment (0). Therefore we call him never-takers. The same way of interpretation for the other three rows.

A motivation for using IV methods in general is concern about possible **unmeasured confounding**. If there is unmeasured confounding, then we can not marginalize over all confounders, by matching, IPTW, etc.

IV methods do not focus on the average causal effect for the population. Instead, they focus on a **local average treatment effect**.

The target of inference is

$\begin{aligned} &E(Y^{Z=1}|A^{0} = 0, A^{1} =1) - E(Y^{Z=0}|A^{0} = 0, A^{1} = 1) \\ &= E(Y^{Z=1} - Y^{Z = 0}|compliers)\\ &= E(Y^{A=1} - Y^{A = 0}|compliers)\\ \end{aligned}$

This is causal because it contrasts counterfactuals in a common population. This is known as **complier average causal effect (CACE)**.

- This is a causal effect in a subpopulation, and that is why we call it "local".
- No inference about defiers, always-takers, or never-takers.

In the real world, for each person we observe an A and a Z, not $(A^0, A^1)$

Z | A | $A^0$ | $A^1$ | Class |
---|---|---|---|---|

0 | 0 | 0 | ? | Never-takers or compliers |

0 | 1 | 1 | ? | Always-takers or defiers |

1 | 0 | ? | 0 | Never-takers or defiers |

1 | 1 | ? | 1 | Always-takers or compliers |

Without additional assumptions, we cannot classify each subject into one of these categories. However, we can narrow it down to two options.

Compliance classes are also known as **principal strata**. These are latent, not directly observable. In the next section we will talk about how we **estimate** the complier average causal effect and what assumptions are needed.

A variable is an instrumental variable (IV) if:

- It is associated with the treatment;
- It affects the outcome only through its effect on treatment;
- Z affects A, but it must not directly affect Y.
- This is known as the
**exclusion restriction**.

Above we have arrived at the conclusion that we are interested in the complier subgroups. The classes include defiers are not of interested. Therefore we need the another assumption. The **monotonicity assumption** is that there are no defiers.

* No one consistetnly does the opposite of what they are told.

* It is called monotonicity because the assumption is that the probability of treatment should increase with more encouragement.

In this section we are going to discuss identification and estimation of causal effects from instrumental variable type of analysis.

Recall that the goal is to estimate $E(Y^{A=1} - Y^{A = 0}|compliers)$. Let's begin with something we can identify, the intention to treat (ITT) effect:

$E(Y^{Z=1} - Y^{Z = 0}) = E(Y|Z=1) - E(Y|Z=0)$

Given the condition that Z has no effect on the always-takers, never-takers and the monotonicity assumption, we can derive the following result:

$\begin{aligned} E(Y|Z=1) - E(Y|Z=0) = &E(Y|Z = 1, compliers)P(compliers) \\ &- E(Y|Z = 0, compliers)P(compliers) \end{aligned}$

which implies

$\begin{aligned} \frac{E(Y|Z=1) - E(Y|Z=0)}{P(compliers)} &= E(Y|Z = 1, compliers)P(compliers) - E(Y|Z = 0, compliers)\\ &= E(Y^{a=1}|compliers) - E(Y^{a=0}|compliers)\\ &= CACE \end{aligned}$

Note that $E(A|Z=1)$ is the proportion of people who are **always takers or compliers** and $E(A|Z=0)$ is the proportion of people who are **always takers**. Therefore P(compliers) is just $E(A|Z=1) - E(A|Z=0)$.

We can derive the expression of CACE as follows:

$\begin{aligned} CACE &= \frac{E(Y|Z = 1) - E(Y|Z = 0)}{E(A|Z = 1) - E(A|Z = 0)} \end{aligned}$

The denominator is causal effect of treatment assignment on the treated received. The numerator is the ITT: causal effect of treatment assignment on the outcome.

Note:

- If perfect compliance, CACE=ITT.
- Denominator always between 0 and 1. Thus, CACE will be at least as large as ITT. ITT is
**underestimate**of CACE, because some people assigned to treatment did not take it.

Two stage least squares is a method for estimating causal effect when you have an instrumental variable.

First step is to regress treatment received, A, on the intrumental variable, Z

$A_{i} = \alpha_{0} + Z_{i} \alpha_{1} + \epsilon_{i}$

where the rror term is mean 0, constant variance. By randomization, $Z_{i}$ and $\epsilon_{i}$ are independent.

After that we can obtain the predicted value of A given Z: $\hat{A_{i}} = \hat{\alpha}_{0} + Z_{i}\hat{\alpha_{1}}$

The second stage is to regress the outcome, Y, on the fitted value from stage 1, $\hat{A_{i}}$:

$Y_{i} = \beta_{0} + \hat{A_{i}} \beta_{1} + \epsilon_{i}$

The estimate of $\beta_{1}$ is estimate of the causal effect.

This section is about how to carry out an instrumental variable analysis

in R. The variables are as follows:

- Z: indicator that the subject grew up near a 4 year college
- A: subject’s years of education
- Y: subject’s income
- X: covariates such as parents years of education, region of country,

age, race, IQ score from a test in HS, etc.

More schooling is associated with higher income, but is it due to the

fact that people with more schooling are different in other ways? That

is, we are concerning measured and unmeasured confounding.

One proposal is to raise the proximity to college as an IV. Living near

a 4 year college is a type of **encouragement**.

```
# install.packages("ivpack")
library(ivpack)
data("card.data")
# VI is nearc4 (near 4 year college)
# outcome is lwage (log of wage)
# 'treatment' is educ (number of years of education)
# you can take a look at descruptive statistivs of variables, but we skip this step here.
# make education binary
educ12 = card.data$educ>12
# estimate proportion of compliers
compl_prop = mean(educ12[card.data$nearc4==1]) - mean(educ12[card.data$nearc4==0])
cat("The proportion of complier is", compl_prop)
```

```
## The proportion of complier is 0.1219293
```

We can see that the proportion is only 12%, the intrument is not

extremely strong but not so weak either. It seems like living near a

four year college does increase the chances of getting an education of

more than 12 years.

```
# intention to treat effect
itt = mean(card.data$lwage[card.data$nearc4==1]) - mean(card.data$lwage[card.data$nearc4==0])
cat("The intention to treat effect is", itt)
```

```
## The intention to treat effect is 0.1559075
```

```
cat("The complier average causal effect is", itt/compl_prop)
```

```
## The complier average causal effect is 1.278672
```

```
# two stafe least squares
## Stage 1: regress A on Z
s1 = lm(educ12 ~ card.data$nearc4)
## get predicted value of A given Z for each subject
predtx = predict(s1,type = 'response')
table(predtx)
```

```
## predtx
## 0.422152560083588 0.54408183146614
## 957 2053
```

We can see that the people who had encouragement had a probability of

0.54 to have an education for more than 12 years. It is slightly higher

than those who did not had encouragement.

```
# stage 2: regress Y on predicted value of A
lm(card.data$lwage~predtx)
```

```
##
## Call:
## lm(formula = card.data$lwage ~ predtx)
##
## Coefficients:
## (Intercept) predtx
## 5.616 1.279
```

We can see that the CACE is 1.279. It is **the same** as the one we

estimated before by hand.

That brings us the end of the causal inference series. We have gong through the definition of causal effect to the estimation of causal inference. Part 2 talks about matching, which is used to solve the problem when we do not have randomized trials. Part 3 is about IPTW, which is an important method to solve the problem that the subject number in the two matching groups is unbalanced. In this post, we introduced intrumental variable to cope with the situation when the ignorability assumption is not fulfilled.

My biggest impression after the past 2 months' study is that the statsiticians have made so much effort to develp the causal inference and make it as applicable to real world as possible. And I hope the readers can remember correlation does not equal to causal relationship!