Bolin Wu

Causal Inference 4: Instrumental variable

2021-06-03 · 13 min read
R causal inference

Intromental variables (IV) is an alternative causal inference method that does not rely on the ignorability assumption.

Introduction to instrumental variables

Let us think of an example. A: smoking during pregnancy (yes or no); Y: birthweight; X: party, mother's age, weight, etc.
Concern: There could be unmeasured confounders.
Challenge: Not ethical to randomly assignsmoking to pregnant women.

So there could be unmeasured confounders like mother's age, whether she's given birth before or her weight, etc. What we could do is using the encouragement design.

Encouragement design

Adding Z: randomized to either receive encouragement to stop smoking (Z=1) or receive usual care (Z = 0). An intention-to-treat analysis would focus on the causal effect of encouragement:

E(Yz=1)E(Yz=0)E(Y^{z=1}) - E(Y^{z=0})

This is a valid causal effect and would likely be of some interest.

What can we say about the causal effect of smoking itself? This is the focus of IV mothods.

Randomized trials with noncompliance

We can begin by imagining a randomized trials:

  • Z: randomized to treatment (1 if randomized to treatment, otherwise 0)
  • A: treatment received (1 if receive treatment, otherwise 0)
  • Y: outcome
  • Please note that not everyone assigned treatment will actually receive the treatment (non-compliance).

Essentially the non-compliance makes a randomized trial like an observational study. There could be confounding based on treatment received. It might be reasonable to assume that treatment assignment does not directly affect Y. Here Z can be thought of as (strong) encouragement to receive the treatment.

Compliance classes

We can classify people based on potential treatment.

A0A^0 A1A^1 Label
0 0 Never-takers
0 1 Compliers
1 0 Defiers
1 1 Always-takers

The number in the table represents if a person actually take the treatment or not. Take the first row as anexample, when he is not assignted to receive to treatment, he does not receive the treatment (0). However, when he is assigned, he still does not take the treatment (0). Therefore we call him never-takers. The same way of interpretation for the other three rows.

A motivation for using IV methods in general is concern about possible unmeasured confounding. If there is unmeasured confounding, then we can not marginalize over all confounders, by matching, IPTW, etc.

IV methods do not focus on the average causal effect for the population. Instead, they focus on a local average treatment effect.

Local average treatment effect

The target of inference is

E(YZ=1A0=0,A1=1)E(YZ=0A0=0,A1=1)=E(YZ=1YZ=0compliers)=E(YA=1YA=0compliers)\begin{aligned} &E(Y^{Z=1}|A^{0} = 0, A^{1} =1) - E(Y^{Z=0}|A^{0} = 0, A^{1} = 1) \\ &= E(Y^{Z=1} - Y^{Z = 0}|compliers)\\ &= E(Y^{A=1} - Y^{A = 0}|compliers)\\ \end{aligned}

This is causal because it contrasts counterfactuals in a common population. This is known as complier average causal effect (CACE).

  • This is a causal effect in a subpopulation, and that is why we call it "local".
  • No inference about defiers, always-takers, or never-takers.

In the real world, for each person we observe an A and a Z, not (A0,A1)(A^0, A^1)

Z A A0A^0 A1A^1 Class
0 0 0 ? Never-takers or compliers
0 1 1 ? Always-takers or defiers
1 0 ? 0 Never-takers or defiers
1 1 ? 1 Always-takers or compliers

Without additional assumptions, we cannot classify each subject into one of these categories. However, we can narrow it down to two options.


Compliance classes are also known as principal strata. These are latent, not directly observable. In the next section we will talk about how we estimate the complier average causal effect and what assumptions are needed.


Assumptions about IVs

A variable is an instrumental variable (IV) if:

  1. It is associated with the treatment;
  2. It affects the outcome only through its effect on treatment;
    • Z affects A, but it must not directly affect Y.
    • This is known as the exclusion restriction.

Monotonicity assumption

Above we have arrived at the conclusion that we are interested in the complier subgroups. The classes include defiers are not of interested. Therefore we need the another assumption. The monotonicity assumption is that there are no defiers.
* No one consistetnly does the opposite of what they are told.
* It is called monotonicity because the assumption is that the probability of treatment should increase with more encouragement.

Causal effect identifcation and estimation

In this section we are going to discuss identification and estimation of causal effects from instrumental variable type of analysis.

Recall that the goal is to estimate E(YA=1YA=0compliers)E(Y^{A=1} - Y^{A = 0}|compliers). Let's begin with something we can identify, the intention to treat (ITT) effect:

E(YZ=1YZ=0)=E(YZ=1)E(YZ=0)E(Y^{Z=1} - Y^{Z = 0}) = E(Y|Z=1) - E(Y|Z=0)

Given the condition that Z has no effect on the always-takers, never-takers and the monotonicity assumption, we can derive the following result:

E(YZ=1)E(YZ=0)=E(YZ=1,compliers)P(compliers)E(YZ=0,compliers)P(compliers)\begin{aligned} E(Y|Z=1) - E(Y|Z=0) = &E(Y|Z = 1, compliers)P(compliers) \\ &- E(Y|Z = 0, compliers)P(compliers) \end{aligned}

which implies

E(YZ=1)E(YZ=0)P(compliers)=E(YZ=1,compliers)P(compliers)E(YZ=0,compliers)=E(Ya=1compliers)E(Ya=0compliers)=CACE\begin{aligned} \frac{E(Y|Z=1) - E(Y|Z=0)}{P(compliers)} &= E(Y|Z = 1, compliers)P(compliers) - E(Y|Z = 0, compliers)\\ &= E(Y^{a=1}|compliers) - E(Y^{a=0}|compliers)\\ &= CACE \end{aligned}

Note that E(AZ=1)E(A|Z=1) is the proportion of people who are always takers or compliers and E(AZ=0)E(A|Z=0) is the proportion of people who are always takers. Therefore P(compliers) is just E(AZ=1)E(AZ=0)E(A|Z=1) - E(A|Z=0).

We can derive the expression of CACE as follows:

CACE=E(YZ=1)E(YZ=0)E(AZ=1)E(AZ=0)\begin{aligned} CACE &= \frac{E(Y|Z = 1) - E(Y|Z = 0)}{E(A|Z = 1) - E(A|Z = 0)} \end{aligned}

The denominator is causal effect of treatment assignment on the treated received. The numerator is the ITT: causal effect of treatment assignment on the outcome.


  • If perfect compliance, CACE=ITT.
  • Denominator always between 0 and 1. Thus, CACE will be at least as large as ITT. ITT is underestimate of CACE, because some people assigned to treatment did not take it.

Two stage least squares

Two stage least squares is a method for estimating causal effect when you have an instrumental variable.

First step is to regress treatment received, A, on the intrumental variable, Z

Ai=α0+Ziα1+ϵiA_{i} = \alpha_{0} + Z_{i} \alpha_{1} + \epsilon_{i}

where the rror term is mean 0, constant variance. By randomization, ZiZ_{i} and ϵi\epsilon_{i} are independent.

After that we can obtain the predicted value of A given Z: Ai^=α^0+Ziα1^\hat{A_{i}} = \hat{\alpha}_{0} + Z_{i}\hat{\alpha_{1}}

The second stage is to regress the outcome, Y, on the fitted value from stage 1, Ai^\hat{A_{i}}:

Yi=β0+Ai^β1+ϵiY_{i} = \beta_{0} + \hat{A_{i}} \beta_{1} + \epsilon_{i}

The estimate of β1\beta_{1} is estimate of the causal effect.

Data example in R

This section is about how to carry out an instrumental variable analysis
in R. The variables are as follows:

  • Z: indicator that the subject grew up near a 4 year college
  • A: subject’s years of education
  • Y: subject’s income
  • X: covariates such as parents years of education, region of country,
    age, race, IQ score from a test in HS, etc.

Study motivation

More schooling is associated with higher income, but is it due to the
fact that people with more schooling are different in other ways? That
is, we are concerning measured and unmeasured confounding.

One proposal is to raise the proximity to college as an IV. Living near
a 4 year college is a type of encouragement.

# install.packages("ivpack")

# VI is nearc4 (near 4 year college)
# outcome is lwage (log of wage)
# 'treatment' is educ (number of years of education)

# you can take a look at descruptive statistivs of variables, but we skip this step here.

# make education binary
educ12 =$educ>12
# estimate proportion of compliers
compl_prop = mean(educ12[$nearc4==1]) - mean(educ12[$nearc4==0])
cat("The proportion of complier is", compl_prop)
## The proportion of complier is 0.1219293

We can see that the proportion is only 12%, the intrument is not
extremely strong but not so weak either. It seems like living near a
four year college does increase the chances of getting an education of
more than 12 years.

# intention to treat effect
itt = mean($lwage[$nearc4==1]) - mean($lwage[$nearc4==0])
cat("The intention to treat effect is", itt)
## The intention to treat effect is 0.1559075
cat("The complier average causal effect is", itt/compl_prop)
## The complier average causal effect is 1.278672
# two stafe least squares
## Stage 1: regress A on Z
s1 = lm(educ12 ~$nearc4)
## get predicted value of A given Z for each subject
predtx = predict(s1,type = 'response')
## predtx
## 0.422152560083588  0.54408183146614 
##               957              2053

We can see that the people who had encouragement had a probability of
0.54 to have an education for more than 12 years. It is slightly higher
than those who did not had encouragement.

# stage 2: regress Y on predicted value of A
## Call:
## lm(formula =$lwage ~ predtx)
## Coefficients:
## (Intercept)       predtx  
##       5.616        1.279

We can see that the CACE is 1.279. It is the same as the one we
estimated before by hand.


That brings us the end of the causal inference series. We have gong through the definition of causal effect to the estimation of causal inference. Part 2 talks about matching, which is used to solve the problem when we do not have randomized trials. Part 3 is about IPTW, which is an important method to solve the problem that the subject number in the two matching groups is unbalanced. In this post, we introduced intrumental variable to cope with the situation when the ignorability assumption is not fulfilled.

My biggest impression after the past 2 months' study is that the statsiticians have made so much effort to develp the causal inference and make it as applicable to real world as possible. And I hope the readers can remember correlation does not equal to causal relationship!

Prudence is a fountain of life to the prudent.