Bolin Wu

Causal Inference 1: Causal Effects and Confounding

2021-04-15 · 11 min read
R causal inference

Causal inference has been a heated field in statistics. It has great application for observational data. In this post I will shares some key concepts of causal inference:

  • The confusion over causal inference
  • The important causal assumptions
  • The concept of causal effects
  • Confounding and Directed Acyclic Graphs

Main reference:

Confusion over causality

Spurious correlation

Causally unrelated variables might happen to be highly correlated with each other over some period of time. E.g. Divorce rate in Marine correlates with Percapita consumtion of margarine.


Example: Bill Smith lived to be 105 years old. He said the secret to his longevity was eating one turnip a day.

  • All we know is that Bill Smith lived to be 105 years old AND he ate one turnip a day.
  • We do not know if eating turnips contributed to his lifespan.
  • We do not know what would happen if other people adopted this habbit.

Science reporting

Headlines often do not use the forms of the word cause, but do get interpreted causally.
Example: Positive link between video games and acedemic performance, study suggests.

In reality, a lot of times how skeptically people view headlines depends on what their prior beliefs are. In causality analysis, we want to move away from that to a large degree. Instead, we want to look at the evidence as it is.

Some key points:

  1. What statistical methods did they use?
  2. How was the study designed?
  3. What assumptions did they made?

Reverse causality

Even if there is a causal relationship, sometimes the direction is unclear.
Example: Urban green space and exercise. Does green space in urban environments cause people to exercise more? Or the fact that more people come to exercise causes the govenment to build more gree space?

How to clear up confusion?

The idea of causal inference attempts to do this by proposing:

  • Formal definitions of causal effects.
  • Assumptions necessary to identify causal effects from data.
  • Rules about what variables need to be controlled for.
  • Sensitivity analysis to determine the impact of violations of assumptions on conclusions.

Statisticians started working on causal modeling as far back as the 1920s (Wright 1921; Neyman 1923)
It became its own area of statistical research since around 1970s.
Some highlights:

  • Re-introduction of potential outcomes: Rubin causal model (Rubin 1974).
  • Causal diagrams (Greenland and Robins 1986; Pearl 2000).
  • Propensity scores (Rosenbaum and Rubin 1983).
  • Time-dependent confounding (Robins 1986; Robins 1997).
  • Optimal dynamic treatment strategies (Murphy 2003; Robins 2004).
  • Target learning (vander Laan 2009).

As we dive deeper into causal modeling, it will be important to remember:

  • Causal inference requires making some untestable assumptions (reffered to as causal assumptions)
  • Cochran (1972) concludes:
    "...observational studies are an interesting and challenging field which demands a good quality of humility, since we can claim only to be groping toward the truth."

Treatment, potential outcomes and counterfactuals

Here we will introduce some notations that is important for the following post.

Suppose we are interested in the causal effect of some treatment A on some outcome Y.
treatment example: A = 1 if receive influenza vaccine; A = 0 otherwise. Here is a treatment that takes two values 1 or 0.
Outcome example: Y = 1 if develop cardiovascular disease within 2 years; Y = 0 otherwise.

What is potential outcomes? You can think of it as the possible outcomes before the study takes place.

Notation: YaY^{a} is the outcome that would be observed if treatment was set to A = a.

What about counterfactuals? Counterfactual outcomes are ones that would have been ovserved had the treatment been different. For example: if my treatment wes A = 1, then my counterfactual outcome is Y0Y^{0}.

Hypothetical intervention

One important assumption of causal effects of intervention is that the variables can be manipulated. Halland (1986) famously wrote "no causation without manipulation"". We can imagine that we can manupulate some people get drug A while others get drugs B.

However, it is less clear about what a causal effect of an immutable variable would mean e.g. gender, age, race. One way to approch this is to relate these varibales to the variables that we can manupulate.

No direct intervention Manipulable intervention
Race Name on resume
Obesity Bariatric surgery
Socioeconomic status Gift of money

For the remainder of the post, we will primarily focus on treaments that could be thought of as interventions. Treatments that we can imagine being randomized (manupulated) in a hypothetical trial. The reason that we focus on causal effect of hypothetical interventions is that

  1. Their meaning is well defined.
  2. They are potentially actionable.

What are causal effects?

In general: A had a causal effect on Y if Y1Y^{1} differs from Y0Y^{0}.
The foundamental problem of causal inference is that we can only observe one potential outcome for each person. However, with certain assumptions, we can estimate pupulation level (average) causal effects. That is, rather than think if the causal effect work for individual, we think of the population as a whole. Therefore we never know the unit level causal effect.

  • Hopeless: What would have happened to me had I not taken ibuprofen? (unit elvel causal effect)
  • Possible: What would the rate of headache remission be if everyone took ibuprofen when they had a headache versus no one did?

Average Causal Effect

Definition: E(Y1Y0)E(Y^{1} - Y^{0}).
It means the average value of Y if everyone was treated with A = 1 minus the average value of Y if everyone was treated with A = 0, if Y is binary. Please note that this is just an ideal definition because we could never actually observe that in the real world.

Conditioning on, VS setting, treatment

In general,

E(Y1Y0)E(YA=1)E(YA=0)\begin{aligned} E(Y^{1} - Y^{0}) \neq E(Y|A = 1) - E(Y|A = 0) \end{aligned}

The reason is that E(Y1E(Y^{1} is the mean of Y if the whole population was treated with A = 1; while E(YA=1)E(Y\|A = 1) is mean of Y among people with A = 1. Technically, E(YA=1)E(YA=0)E(Y\|A = 1) - E(Y\|A = 0) is not a causal effect because it is comparing two different populations of people.

Other causal effects

Other causal effects that we may be interested in are:

  • E(Y1/Y0)E(Y^{1} / Y^{0}) : causal relative risk
  • E(Y1Y0A=1)E(Y^{1} - Y^{0}\|A = 1) : causal effect of treatment on the treated


  1. How do we use observed data to link observed outcomes to potential outcomes?
  2. What assumptions are necessay to estimate causal effects from observed data?

Causal assumptions

Identifiability of causal effects requires making some untestable assumptions. These are generally called causal assumptions.
The most common are :

  • Stable Unit Treatment Value Assumption (SUTVA)
  • Consistency
  • Ignorability
  • Positivity
    They are all about the observed data: Y, A and a set of pre-treatment covariates X.


It involves two assumptions:

  1. No interference:
    • Unites do not interfere with each other
    • Treatment assigment of one unit does not affect that outcome of another unit.
    • "Spoillover" or "contagion" are also terms for interference.
  2. One version of treatment
    • The potential outcomes can effectively linked to the observed data.
      SUTVA allows us to write potential outcome for the ith person in terms of only that person's treatments.

Consistency assumption

The potential outcome under treatment A = a is equal to the observed outcome if the actual treatment received is A = a.

Ignorability assumption

Given pre-treatment covariates X, treatment assignment is independent from the potential outcomes.

Y0,Y1AXY^{0}, Y^{1} \coprod A|X

Essentially it means that the treatment A is randomly assigned regardless of X.

Positivity assumtion

It refers to that everybody has a positive opportunity to receive either treatment.

P(A=aX=x)>0,for all a and xP(A = a| X = x) >0 , \text{for all a and x}

If treatment was deterministic for some values of X, then we would have no ovserved values of Y for one of the treatment groups for thoses values of X.

Confounding and Directed Acyclic Graphs (DAGs)

Confounding control

Confounders are often defined as variables that affect treatment and affect the outcome.

We are interested in identifying a set of variables X that will make the ignorability assumption hold. Then we want to use statistical methods, which will be covered later in the course, to control these variables and estimated causal effects.

Causal graph

Graphs (causal graphs or directed acyclic graphs) are considered useful for causal inference. The functions of causal graphs are:

  • Helpful for identifying which variables to control for
  • Make assumptions eplicit.

Here I would not explain all the details about DAGs, more information about compatibility between DAGs and distributions can be found here.

Instead, I would like to note down some interesting facts when learning DAGs path and associations.

  1. For a fork path A<E>BA <-E-> B, A and B are dependent because the information from E flows to both A and B.
  2. For colliders A>G<BA->G<-B, A and B are independent. However, if we control forff G, then A and B are dependent.


Now we have an overview of the causal effects and we know that DAGs is an important method to identify the variables need to be controlled in order to achieve ignorability assumption. Next we will proceed to see how to control the counfounders. The two of the general approaches are matching and inverse probability of treatment weighting.

Prudence is a fountain of life to the prudent.