Propensity Score Matching

The idea with propensity score methods is to compare units who, based on observables, had very similar probabilities of being placed into the treatment group even though those units differed with regards to actual treatment assignment. If conditional on X, two units have the same probability of being treated, then we say they have similar propensity scores. If two units have the same propensity score, but one is the treatment group and the other is not, and the conditional independence assumption credibly holds in the data, then differences between their observed outcomes are attributable to the treatment.

In this tutorial, we aim to examine the effect of the mother smoked during pregnancy on birth weight of baby. Because the birth weight of babies who have the mother smoked on average are different from babies who have the mother does not smoke, we will use propensity score matching method to get more creditable casual estimates of smoking during pregnancy.

Variables:

mbsmoke: the treatment variable - 1 if mother smoked.
bweight: the outcome variable - infant birthweight (grams)
married: 1 if mother married
mage: mother’s age
medu: mother’s education attainment
fbaby: 1 if first baby

First, we need to load packages and data:

library(MatchIt)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
birth_weight <- read.csv("C:/Users/DuyTung/Desktop/Birth_Weight.csv")

To view the first few records in the dataset

head(birth_weight)

##   bweight mbsmoke mmarried mage medu fbaby
## 1    3459       0        1   24   14     0
## 2    3260       0        0   20   10     0
## 3    3572       0        1   22    9     0
## 4    2948       0        1   26   12     0
## 5    2410       0        1   20   12     1
## 6    3147       0        0   27   12     1

1. Descriptive Statistics

Let’s do a summary statistic that quantitatively describes main characteristics of variables in the data set.

library(pastecs)

## 
## Attaching package: 'pastecs'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

stat.desc(birth_weight, basic=F)

##                   bweight     mbsmoke    mmarried        mage        medu
## median       3.390000e+03 0.000000000 1.000000000 26.00000000 12.00000000
## mean         3.361680e+03 0.186126670 0.699698406 26.50452391 12.68957346
## SE.mean      8.495534e+00 0.005713167 0.006728658  0.08247237  0.03699661
## CI.mean.0.95 1.665528e+01 0.011200523 0.013191368  0.16168505  0.07253094
## var          3.350322e+05 0.151516173 0.210165822 31.57345507  6.35373311
## std.dev      5.788196e+02 0.389250784 0.458438460  5.61902617  2.52066124
## coef.var     1.721817e-01 2.091321920 0.655194375  0.21200253  0.19864034
##                   fbaby
## median       0.00000000
## mean         0.43795778
## SE.mean      0.00728274
## CI.mean.0.95 0.01427763
## var          0.24620380
## std.dev      0.49618928
## coef.var     1.13296145

1.1 Pre-analysis: Comparing groups for statistical differences

Outcome variable

birth_weight %>%
  group_by(mbsmoke) %>%
  summarise(obs = n(),
            mean = mean(bweight),
            std_error = sd(bweight)/sqrt(obs))

## # A tibble: 2 x 4
##   mbsmoke   obs  mean std_error
##     <int> <int> <dbl>     <dbl>
## 1       0  3778 3413.      9.28
## 2       1   864 3138.     19.1

t-test

with(birth_weight, t.test(bweight ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  bweight by mbsmoke
## t = 12.971, df = 1303.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  233.6210 316.8827
## sample estimates:
## mean in group 0 mean in group 1 
##        3412.912        3137.660

The result of t-test shows that the birth weight of babies who have the mother smoked are significantly lower from babies who have the mother does not smoke. Should we conclude that the treatment (smoking during pregnancy) is dangerous? Maybe! However, is this a reliable result? Before answering this question, we need to check whether there is a difference in covariates between two groups.

Covariates

Here is the mean for each covariate by the treatment status:

bw_cov <- c("mmarried","mage","medu", "fbaby")
birth_weight %>%
  group_by(mbsmoke) %>%
  select(one_of(bw_cov)) %>%
  summarise_all(funs(mean(., na.rm = T)))

## Adding missing grouping variables: `mbsmoke`

## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.

## # A tibble: 2 x 5
##   mbsmoke mmarried  mage  medu fbaby
##     <int>    <dbl> <dbl> <dbl> <dbl>
## 1       0    0.751  26.8  12.9 0.453
## 2       1    0.473  25.2  11.6 0.372

In addition, we carry out t-tests to evaluate whether these means are statistically distinguishable: The ‘mmarried’ variable

with(birth_weight, t.test(mmarried ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  mmarried by mbsmoke
## t = 15.118, df = 1175.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2419887 0.3141636
## sample estimates:
## mean in group 0 mean in group 1 
##       0.7514558       0.4733796

The ‘mage’ variable

with(birth_weight, t.test(mage ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  mage by mbsmoke
## t = 8.1218, df = 1348, p-value = 1.028e-15
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.246770 2.040861
## sample estimates:
## mean in group 0 mean in group 1 
##        26.81048        25.16667

The ‘medu’ variable

with(birth_weight, t.test(medu ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  medu by mbsmoke
## t = 15.279, df = 1454.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.125228 1.456708
## sample estimates:
## mean in group 0 mean in group 1 
##        12.92986        11.63889

The ‘fbaby’ variable

with(birth_weight, t.test(fbaby ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  fbaby by mbsmoke
## t = 4.4517, df = 1314.6, p-value = 9.24e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.04565306 0.11759102
## sample estimates:
## mean in group 0 mean in group 1 
##       0.4531498       0.3715278

2. Propensity score estimation

Now let’s prepare a logit (probit) model to estimate the propensity scores. That is, we will use the covariates to predict the probability of being exposed (smoking during pregnancy). The more true covariates we use, the better our prediction of the probability of being exposed. We calculate a propensity score for all subjects, exposed and unexposed. In particular, we will create the logit model of the treatment (i.e. smoking during pregnancy) regressed upon the following covariates: mmarried, mage, medu, and fbaby.

pscores.model <- glm(mbsmoke ~ mmarried + mage + medu + fbaby, family = binomial("logit"), data = birth_weight)
summary(pscores.model)

## 
## Call:
## glm(formula = mbsmoke ~ mmarried + mage + medu + fbaby, family = binomial("logit"), 
##     data = birth_weight)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6037  -0.6002  -0.4817  -0.3744   2.3845  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.024681   0.244809   4.186 2.84e-05 ***
## mmarried    -1.036252   0.090000 -11.514  < 2e-16 ***
## mage        -0.003663   0.008388  -0.437    0.662    
## medu        -0.127238   0.016766  -7.589 3.23e-14 ***
## fbaby       -0.476541   0.087178  -5.466 4.59e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4461.5  on 4641  degrees of freedom
## Residual deviance: 4115.2  on 4637  degrees of freedom
## AIC: 4125.2
## 
## Number of Fisher Scoring iterations: 4

Using this model, we can now calculate the propensity score for each mother. It is simply the mother’s predicted probability of being Treated, given the estimates from the logit model. Below, we calculate this propensity score using fitted.values and add the predicted propensity score into the original dataframe.

birth_weight$PScores <- pscores.model$fitted.values

After estimating the propensity score, it is useful to plot histograms of the estimated propensity scores by treatment status. We create histograms showing the density of propensity score distribution in the treated and control groups.

hist(birth_weight$PScores[birth_weight$mbsmoke==1],main = "PScores of Response = 1")

hist(birth_weight$PScores[birth_weight$mbsmoke==0],main = "PScores of Response = 0")

3. The Matching Algorithms

We can now proceed with the matching algorithms with our ‘pscores.model’ and the estimated propensity scores. The matching algorithms create sets of participants for treatment and control groups. A matched set will consist of at least one person from the treatment group (i.e., the smoked mother) and one from the control group (i.e., the non-smoked mothers) with similar propensity scores. The basic goal is to approximate a random experiment, eliminating many of the problems that come with observational data analysis.

There are various matching algorithms in R, namely, exact matching, nearest neighbor, optimal matching, full matching and caliper matching. Let’s try the Nearest Neighbour Matching algorithm. Nearest Matching is a technique that matches individuals with controls (it could be two or more controls per treated unit) based on a distance. We use the package MatchIt for this. This package estimates the propensity score in the background and then matches observations based on the method of choice (“nearest” in this case).

match_nearest <- matchit(pscores.model, method="nearest", radio=1,data=birth_weight)

We can get some information about how successful the matching was using summary() and plot()

summary(match_nearest)

## 
## Call:
## matchit(formula = pscores.model, data = birth_weight, method = "nearest", 
##     radio = 1)
## 
## Summary of balance for all data:
##          Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean
## distance        0.2484        0.1719     0.1028    0.0765  0.0619   0.0764
## mmarried        0.4734        0.7515     0.4322   -0.2781  0.0000   0.2778
## mage           25.1667       26.8105     5.6455   -1.6438  2.0000   1.6863
## medu           11.6389       12.9299     2.5344   -1.2910  1.0000   1.2836
## fbaby           0.3715        0.4531     0.4979   -0.0816  0.0000   0.0810
##          eQQ Max
## distance  0.2293
## mmarried  1.0000
## mage      3.0000
## medu      6.0000
## fbaby     1.0000
## 
## 
## Summary of balance for matched data:
##          Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean
## distance        0.2484        0.2483     0.1205    0.0001       0   0.0004
## mmarried        0.4734        0.4826     0.5000   -0.0093       0   0.0093
## mage           25.1667       24.9514     5.4384    0.2153       0   0.2616
## medu           11.6389       11.6053     2.3053    0.0336       0   0.0984
## fbaby           0.3715        0.3634     0.4813    0.0081       0   0.0081
##          eQQ Max
## distance  0.0162
## mmarried  1.0000
## mage      3.0000
## medu      5.0000
## fbaby     1.0000
## 
## Percent Balance Improvement:
##          Mean Diff. eQQ Med eQQ Mean eQQ Max
## distance    99.8182     100  99.4529 92.9373
## mmarried    96.6702       0  96.6667  0.0000
## mage        86.9038     100  84.4887  0.0000
## medu        97.4000     100  92.3354 16.6667
## fbaby       90.0739       0  90.0000  0.0000
## 
## Sample sizes:
##           Control Treated
## All          3778     864
## Matched       864     864
## Unmatched    2914       0
## Discarded       0       0

plot(match_nearest, type="hist")

To create a dataframe containing only the matched observations, we can use the match.data() function:

data_matched <- match.data(match_nearest)
dim(data_matched)

## [1] 1728    9

Let’s do a summary statistic for covariates in the matched sample

library(tableone)
table_match <- CreateTableOne(vars = bw_cov,strata = "mbsmoke",data = data_matched,test = FALSE)
print(table_match, smd = TRUE)

##                       Stratified by mbsmoke
##                        0            1            SMD   
##   n                      864          864              
##   mmarried (mean (SD))  0.48 (0.50)  0.47 (0.50)  0.019
##   mage (mean (SD))     24.95 (5.44) 25.17 (5.30)  0.040
##   medu (mean (SD))     11.61 (2.31) 11.64 (2.17)  0.015
##   fbaby (mean (SD))     0.36 (0.48)  0.37 (0.48)  0.017

The first thing we see above is that we matched 864 controls to 864 treated subjects and also we can see we get very small numbers for SMD in the covariates, hence we can conclude that we did a pretty good job of matching with nearest neighbour. We have a great balance in the dataset now to proceed with our further analysis.

4. Examining covariate balance in the matched sample

We’ll do more formal assessment for covariate balance in the matched sample by: visual inspection and t-tests of difference-in-means.

4.1 Visual Inspection

It is useful to plot the mean of each covariate against the estimated propensity score, separately by treatment status. If matching is done well, the treatment and control groups will have (near) identical means of each covariate at each value of the propensity score. Below is an example using these covariates in our model. Here I use a loess smoother to estimate the mean of each covariate, by treatment status, at each value of the propensity score.

fn_bal <- function(dta, variable) {
  dta$variable <- dta[, variable]
  dta$mbsmoke <- as.factor(dta$mbsmoke)
  support <- c(min(dta$variable), max(dta$variable))
  ggplot(dta, aes(x = distance, y = variable, color = mbsmoke)) +
    geom_point(alpha = 0.2, size = 1.3) +
    geom_smooth(method = "loess", se = F) +
    xlab("Propensity score") +
    ylab(variable) +
    theme_bw() +
    ylim(support)
}

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

grid.arrange(
  fn_bal(data_matched, "mmarried") + theme(legend.position = "none"),
  fn_bal(data_matched, "mage"),
  fn_bal(data_matched, "medu"),
  fn_bal(data_matched, "fbaby") + theme(legend.position = "none"),
  nrow = 3, widths = c(1, 0.8)
)

## Warning: Removed 72 rows containing missing values (geom_smooth).

## Warning: Removed 6 rows containing missing values (geom_smooth).

## Warning: Removed 65 rows containing missing values (geom_smooth).

4.2 T-tests of difference-in-means

We will test the difference in covariates between groups after matching using t-tests. Ideally, we should not be able to reject the null hypothesis of no mean difference for each covariate:

with(data_matched, t.test(mmarried ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  mmarried by mbsmoke
## t = 0.38507, df = 1726, p-value = 0.7002
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.03790285  0.05642137
## sample estimates:
## mean in group 0 mean in group 1 
##       0.4826389       0.4733796

with(data_matched, t.test(mage ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  mage by mbsmoke
## t = -0.83319, df = 1724.9, p-value = 0.4049
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7220458  0.2914903
## sample estimates:
## mean in group 0 mean in group 1 
##        24.95139        25.16667

with(data_matched, t.test(medu ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  medu by mbsmoke
## t = -0.31178, df = 1719.5, p-value = 0.7552
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2447155  0.1775858
## sample estimates:
## mean in group 0 mean in group 1 
##        11.60532        11.63889

with(data_matched, t.test(fbaby ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  fbaby by mbsmoke
## t = -0.34909, df = 1726, p-value = 0.7271
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.05362164  0.03741794
## sample estimates:
## mean in group 0 mean in group 1 
##       0.3634259       0.3715278

5. Estimating treatment effects

We will now do a t test to test our hypothesis that there is an negative effect of smoking during pregnancy on birth weight of baby:

with(data_matched, t.test(bweight ~ mbsmoke))

## 
##  Welch Two Sample t-test
## 
## data:  bweight by mbsmoke
## t = 7.4043, df = 1698.7, p-value = 2.066e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  157.1736 270.4491
## sample estimates:
## mean in group 0 mean in group 1 
##        3351.471        3137.660

From the results above we see a very small p-value that makes the model highly significant. The birth weight of babies who have the mother smoked on average are 3137.66 gram, while the birth weight of babies who have the mother does not smoke on average are 3355.99 gram.

Further, we can do OLS regression with or without covariates using the matched sample: OLS Regression without covariates

ols1 <- lm(bweight ~ mbsmoke, data_matched)
summary(ols1)

## 
## Call:
## lm(formula = bweight ~ mbsmoke, data = data_matched)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3011.47  -316.02    48.53   376.59  1880.34 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3351.47      20.42 164.135  < 2e-16 ***
## mbsmoke      -213.81      28.88  -7.404 2.05e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 600.2 on 1726 degrees of freedom
## Multiple R-squared:  0.03079,    Adjusted R-squared:  0.03022 
## F-statistic: 54.82 on 1 and 1726 DF,  p-value: 2.051e-13

OLS Regression with covariates

ols2 <- lm(bweight ~ mbsmoke+mmarried + mage + medu + fbaby, data_matched)
summary(ols2)

## 
## Call:
## lm(formula = bweight ~ mbsmoke + mmarried + mage + medu + fbaby, 
##     data = data_matched)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3096.23  -301.59    34.19   369.30  1917.14 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3243.827     95.594  33.933  < 2e-16 ***
## mbsmoke     -212.276     28.688  -7.400 2.13e-13 ***
## mmarried     147.406     31.109   4.738 2.33e-06 ***
## mage          -3.024      3.044  -0.993    0.321    
## medu           8.948      6.833   1.309    0.191    
## fbaby         22.283     31.332   0.711    0.477    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 596 on 1722 degrees of freedom
## Multiple R-squared:  0.04652,    Adjusted R-squared:  0.04375 
## F-statistic:  16.8 on 5 and 1722 DF,  p-value: 3.04e-16