Causal inference has been a heated field in statistics. It has great application for observational data. In this post I will shares some key concepts of causal inference:
Main reference:
Causally unrelated variables might happen to be highly correlated with each other over some period of time. E.g. Divorce rate in Marine correlates with Percapita consumtion of margarine.
Example: Bill Smith lived to be 105 years old. He said the secret to his longevity was eating one turnip a day.
Headlines often do not use the forms of the word cause, but do get interpreted causally.
Example: Positive link between video games and acedemic performance, study suggests.
In reality, a lot of times how skeptically people view headlines depends on what their prior beliefs are. In causality analysis, we want to move away from that to a large degree. Instead, we want to look at the evidence as it is.
Some key points:
Even if there is a causal relationship, sometimes the direction is unclear.
Example: Urban green space and exercise. Does green space in urban environments cause people to exercise more? Or the fact that more people come to exercise causes the govenment to build more gree space?
The idea of causal inference attempts to do this by proposing:
Statisticians started working on causal modeling as far back as the 1920s (Wright 1921; Neyman 1923)
It became its own area of statistical research since around 1970s.
Some highlights:
As we dive deeper into causal modeling, it will be important to remember:
Here we will introduce some notations that is important for the following post.
Suppose we are interested in the causal effect of some treatment A on some outcome Y.
treatment example: A = 1 if receive influenza vaccine; A = 0 otherwise. Here is a treatment that takes two values 1 or 0.
Outcome example: Y = 1 if develop cardiovascular disease within 2 years; Y = 0 otherwise.
What is potential outcomes? You can think of it as the possible outcomes before the study takes place.
Notation: is the outcome that would be observed if treatment was set to A = a.
What about counterfactuals? Counterfactual outcomes are ones that would have been ovserved had the treatment been different. For example: if my treatment wes A = 1, then my counterfactual outcome is .
One important assumption of causal effects of intervention is that the variables can be manipulated. Halland (1986) famously wrote "no causation without manipulation"". We can imagine that we can manupulate some people get drug A while others get drugs B.
However, it is less clear about what a causal effect of an immutable variable would mean e.g. gender, age, race. One way to approch this is to relate these varibales to the variables that we can manupulate.
No direct intervention | Manipulable intervention |
---|---|
Race | Name on resume |
Obesity | Bariatric surgery |
Socioeconomic status | Gift of money |
For the remainder of the post, we will primarily focus on treaments that could be thought of as interventions. Treatments that we can imagine being randomized (manupulated) in a hypothetical trial. The reason that we focus on causal effect of hypothetical interventions is that
In general: A had a causal effect on Y if differs from .
The foundamental problem of causal inference is that we can only observe one potential outcome for each person. However, with certain assumptions, we can estimate pupulation level (average) causal effects. That is, rather than think if the causal effect work for individual, we think of the population as a whole. Therefore we never know the unit level causal effect.
Definition: .
It means the average value of Y if everyone was treated with A = 1 minus the average value of Y if everyone was treated with A = 0, if Y is binary. Please note that this is just an ideal definition because we could never actually observe that in the real world.
In general,
The reason is that is the mean of Y if the whole population was treated with A = 1; while is mean of Y among people with A = 1. Technically, is not a causal effect because it is comparing two different populations of people.
Other causal effects that we may be interested in are:
Identifiability of causal effects requires making some untestable assumptions. These are generally called causal assumptions.
The most common are :
It involves two assumptions:
The potential outcome under treatment A = a is equal to the observed outcome if the actual treatment received is A = a.
Given pre-treatment covariates X, treatment assignment is independent from the potential outcomes.
Essentially it means that the treatment A is randomly assigned regardless of X.
It refers to that everybody has a positive opportunity to receive either treatment.
If treatment was deterministic for some values of X, then we would have no ovserved values of Y for one of the treatment groups for thoses values of X.
Confounders are often defined as variables that affect treatment and affect the outcome.
We are interested in identifying a set of variables X that will make the ignorability assumption hold. Then we want to use statistical methods, which will be covered later in the course, to control these variables and estimated causal effects.
Graphs (causal graphs or directed acyclic graphs) are considered useful for causal inference. The functions of causal graphs are:
Here I would not explain all the details about DAGs, more information about compatibility between DAGs and distributions can be found here.
Instead, I would like to note down some interesting facts when learning DAGs path and associations.
Now we have an overview of the causal effects and we know that DAGs is an important method to identify the variables need to be controlled in order to achieve ignorability assumption. Next we will proceed to see how to control the counfounders. The two of the general approaches are matching and inverse probability of treatment weighting.