In the Part 1 we talked about the basic concepts of causal effect and confounding. In this post we will proceeed with discussing about how to control the confounders with matching.
The main content consists of:
In a randomized trial, treatment assignment A would be determined by a coin toss, in order to erase the effect of X to A. It is a good way to get rid of confounding. However, the researchers may not always use randomized trials for the following reasons:
Observational studies are planned, prospective, observational studies with active data collection:
Matching is a method that attempts to make an observational study more like a randomized trial.
Main idea: Match individuals in the treated group A = 1 to individuals in the control group A = 0 on the covariates X.
Advantages of matching:
Sometimes it is difficult to find great matches. We may be willing to accept some non-ideal matches if treated and control groups have same distribution of covariates. For example:
We can see that the neighter match 1 nor match 2 is a good match. However, if we look at them together, we will find that the average age and percent of male/female are the same. Therefore it is a fine balance. There is a lot of software that can impose fine balance.
Since in real data, we typically can not match exactly, we first need to choose some metric of closeness. There are two common options:
Mahalanobis distance is defined as follows.
Denote the value of a vector of covariates for subject j by and the covariance matrix by S.
Robust Mahalanobis distance is aiming at solving the outliers problem. Outliers can creat large distances between subjects, even if the covariates are otherwise similar. Its application is as follows.
Steps:
We might prefer to exclude treated subject for whom there does not exist a good match. A bad match can be defined by using a caliper --- maximum acceptable distance. Main idea:
Why greedy matching is not optimal? Greedy matching does not lead to the smallest total distance.
The optimal matching minizes global distance measure. It is conputationally demanding. Whether it is feasible to perform depends on the size of the problem, which is measured by the number of posdsible treatment-control pairings. R package "optmatch" and "rcbalance" can do the work.
Consatraints can be imposed to make optimal matching computationally feasible for larger data sets. And that brings us to the sparse optimal matching.
After we have matched, we should access whether matching seemed to work. We can look at a couple of things.
After successfully matching and achieving adequate balance, proceed with outcome analysis.
The main idea is as follows:
This procedure is equivalent to the McNemar test for paried data or paired t-test for continuous data.
The previous mentioned matching aims to achieve balance on observed covariates. If there was imbalance on observed covariates, we call it overt bias. However, if there was imbalance on the variables that we did not match on (including unobserved variables), then we have hidden bias. Then the ignorability assumption violated. The main idea of sensitivity analysis is that if there is hidden bias, determine how severe it would have to be change conclusions.
Let be the probability that person j receives treatment. be be the probability that person k receives treatment. Suppose person j and k are perfectly matched, so that their observed covariates and are the same. If = , then there is no hidden bias. Consider the follow following inequality . is the odds ratio. If = 1, then no overt bias. If >1 implies hidden bias.
The propensity score is the probability of receiving treatment, rather than control, given covariates X. If we define A =1 for treatment and A = 0 for control. We will denote the propensity score for subject i by .
If a person i has a propensity score value of 0.3, that means, given their particular covariate values, there is a 30% chance they will be treated.
Suppose two subjects have the same value of the propensity score, but they possibly have different covariate values of X. Despite the different covariate values, they were both equally likely to have been treated. This means that both subjects X is just as likely to be found in the treatment group. If you restrict to a subpopulation of subjects who have the same value of the propensity score, there sould be balance in the two treatment groups. Then the propensity score is a balancing score. More formally,
Implication: if we match on the propensity score, we should achieve balance. Another way of thinking this is that, considering we assumed ignorability - that treatment is randomized given X. Then conditioning on the propensity score is the same as conditioning on an allocation probability.
In a randomized trial, the propensity score is generally known. However, in an observational study, it will be unknown. Notice that the propensity score involves observed data: A and X. Therefore we can estimate it. Typically when people talk about a propensity score, they are referring to the estimated propensity score. We need to estimate $$P(A=1|X)$$ with outcome to be A. If A is binary, we can estimate the propensity score by using logistic regression.
If there is a lack of overlap, trimming the tails is an option. This means removing subjects who have extreme values of the propensity score. For example, removing control subjects whose propensity score is less than the minimum in the treatment group. Trimming can make the positivity assumption more reasonable.
We can proceed by computing a distance between the propensity score for each treated subject with every control. Then we use the nearest neighbor or optimal matching as before. In practice, logit of the propensity score is often used, rather than the propensity score itself. The reason is that the propensity score is bounded between 0 and 1, making many values similar. However, the logit of the propensity score is unbounded. This transformation essentially stretches the distribution, while preserving ranks.
To ensure that we do not accept any bad matches, a caliper can be used. In practice, a common choice for a caliper is the 0.2 times the standard deviation of logit of the propensity score. The procedure will be like as follows:
Please note that the 0.2 can be set to an arbitrary number. This propensity score matching can be done by R package MatchIt.
we will use data from Lalonde (1986), that aimed to evaluate the impact of National Supported Work (NSW) Demonstration, which is a labor training program, on post-intervention income levels. Interest is in estimating the causal effect of this training program on income.
First load the packages tableone and Matching:
# install.packages("tableone")
# install.packages("Matching")
# install.packages("MatchIt")
library(tidyverse)
library(tableone)
library(Matching)
library(MatchIt)
data(lalonde)
str(lalonde)
The data have n=614 subjects and 9 variables:
The outcome is re78 – post intervention income. The treatment is treat – which is equal to 1 if the subject received the labor training and equal to 0 otherwise.
The potential confounding variables are: age, educ, black, hispan, married, nodegree, re74, re75.
Before mathching, let us find the standardized differences for all of the confounding variables.
xvars = colnames(lalonde %>% select(-c("treat","re78")))
tabel1 = CreateTableOne(vars = xvars, strata = "treat", data = lalonde, test = F)
print(tabel1,smd = T)
Stratified by treat
0 1 SMD
n 429 185
age (mean (SD)) 28.03 (10.79) 25.82 (7.16) 0.242
educ (mean (SD)) 10.24 (2.86) 10.35 (2.01) 0.045
race (%) 1.701
black 87 (20.3) 156 (84.3)
hispan 61 (14.2) 11 ( 5.9)
white 281 (65.5) 18 ( 9.7)
married (mean (SD)) 0.51 (0.50) 0.19 (0.39) 0.719
nodegree (mean (SD)) 0.60 (0.49) 0.71 (0.46) 0.235
re74 (mean (SD)) 5619.24 (6788.75) 2095.57 (4886.62) 0.596
re75 (mean (SD)) 2466.48 (3292.00) 1532.06 (3219.25) 0.287
Q1. What is the standardized difference for married (to nearest hundredth)?
Answer: From the printed table above we can see that the standardized difference for married is nearly 0.719.
Q2. What is the raw (unadjusted) mean of real earnings in 1978 for treated subjects minus the mean of real earnings in 1978 for untreated subjects?
re78_mean = lalonde %>% group_by(treat) %>% summarise_at("re78", .funs = mean)
(re78_mean %>% filter(treat ==1) - re78_mean %>% filter(treat ==0))$re78
# The difference is -635
Now let us estimated a propensity score model by using the logistic regression mdoel, where the outcome is treatment. Obtain the propensity score for each subject.
psmodel = glm(treat~.-re78, family = binomial(),data = lalonde)
summary(psmodel)
# find the propneisty scrore
pscore = psmodel$fitted.values
Carry out propensity score by using the Match function
psmatch = Match(Tr = lalonde$treat, M = 1, X = pscore, replace = F)
matched = lalonde[unlist(psmatch[c("index.treated","index.control")]),]
matchedtab1 = CreateTableOne(vars = xvars, strata = "treat", data = matched, test = F)
print(matchedtab1,smd = T)
Stratified by treat
0 1 SMD
n 185 185
age (mean (SD)) 25.22 (10.57) 25.82 (7.16) 0.066
educ (mean (SD)) 10.54 (2.72) 10.35 (2.01) 0.079
race (%) 0.855
black 87 (47.0) 156 (84.3)
hispan 41 (22.2) 11 ( 5.9)
white 57 (30.8) 18 ( 9.7)
married (mean (SD)) 0.21 (0.41) 0.19 (0.39) 0.041
nodegree (mean (SD)) 0.64 (0.48) 0.71 (0.46) 0.139
re74 (mean (SD)) 2383.67 (4287.87) 2095.57 (4886.62) 0.063
re75 (mean (SD)) 1651.45 (2671.39) 1532.06 (3219.25) 0.040
From the column of SMD, we can see that the values are smaller. This is exactly what we want to see. Usually a boudnary value is 0.1. We would pay attention to the variates with SMD > 0.1. If it is too big, then maybe there is imbalance.
Q3. What is the standardized difference for married?
Answer: It is reduced to 0.041. We end up with better matching.
We can also try carrying out the propensity score matching using the matchit function. And with this function we can easily find the propensity overlapping. The default measure is logistic regression propensity scores.
# use matchit for propensity score, nearest neighbor matching
# the default measure is logistic regression propensity scores.
m.out = matchit(treat~. - re78, data = lalonde, method = "nearest")
summary(m.out)
# propensity score plots
plot(m.out, type = "hist")
plot(m.out, type = "jitter")
Check overlap by jitter plot
Check overlap by histogram
Re-do the matching and set the caliper = 0.1.
# re-do matching using a caliper
psmatch = Match(Tr = lalonde$treat, M = 1, X = pscore, replace = F,caliper = 0.1)
matched = lalonde[unlist(psmatch[c("index.treated","index.control")]),]
matchedtab1 = CreateTableOne(vars = xvars, strata = "treat", data = matched, test = F)
print(matchedtab1,smd = T)
Stratified by treat
0 1 SMD
n 111 111
age (mean (SD)) 26.46 (11.39) 26.22 (7.18) 0.026
educ (mean (SD)) 10.39 (2.70) 10.25 (2.31) 0.054
race (%) 0.048
black 80 (72.1) 82 (73.9)
hispan 11 ( 9.9) 11 ( 9.9)
white 20 (18.0) 18 (16.2)
married (mean (SD)) 0.22 (0.41) 0.24 (0.43) 0.064
nodegree (mean (SD)) 0.63 (0.48) 0.65 (0.48) 0.037
re74 (mean (SD)) 2545.63 (4347.60) 2250.49 (5746.14) 0.058
re75 (mean (SD)) 1740.06 (2725.04) 1222.25 (3081.19) 0.178
Q4. For the matched data, what is the mean of real earnings in 1978 for treated subjects minus the mean of real earnings in 1978 for untreated subjects?
y_trt = matched$re78[matched$treat==1]
y_con = matched$re78[matched$treat==0]
mean(y_trt) - mean(y_con)
[1] 893.4831
The difference is around 893.
Q5. Carry out a paired t-test for the effect of treatment on earnings. What are the values of the 95% confidence interval?
diffy = y_trt - y_con
# paired t-test
t.test(diffy)
One Sample t-test
data: diffy
t = 1.0297, df = 110, p-value = 0.3054
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-826.1391 2613.1052
sample estimates:
mean of x
893.483
From the result we can not reject the null that there is no effect of treatment on earnings.
In this post we have introduced the powerful method of controling confounders, matching on propensity score. We also have demonstrated how to implement the analysis in R. In general, propensity score matching involves the following steps:
My personal biggest take away after learning this is that, it is not difficult to perform matching, however, we need to be careful when evaluating the outcomes of matching. How does the overlapping of matched propensity score looks like? Whether the covariates are balanced? What does the treatment difference test tells us? These questions need to be throughly considered.
Next we will continue to share how to estimate the causal effect by inverse probability of treatment weighting. Thank you for reading.