Propensity score

Overview
In medical research, observational studies do not allow investigators to have control over treatment assignment. As a result, covariates (i.e. age, sex) between treatment groups may differ significantly, which causes biased estimates of treatment effects.

For example, when analyzing the differences in outcomes among patients that received a lung transplant in a study versus those that did not, the lung transplant cohort may have been older and had a lower body weight than the cohort that did not receive a transplant. Since the two cohorts were unbalanced on these two covariates, the estimate for lung transplant will be biased.

Propensity scores help to eliminate this bias.

Definition
In the analysis of treatment effects, suppose that we have a binary treatment T, an outcome Y, and background variables X. The propensity score is defined as the conditional probability of treatment given background variables:


 * $$p(x) \ \stackrel{\mathrm{def}}{=}\ \Pr(T=1 | X=x).$$

The propensity score was introduced by Rosenbaum and Rubin (1983) to provide an alternative method for estimating treatment effects when treatment assignment is not random, but can be assumed to be unconfounded. Let Y(0) and Y(1) denote the potential outcomes under control and treatment, respectively. Then treatment assignment is (conditionally) unconfounded if treatment is independent of potential outcomes conditional on X. This can be written compactly as


 * $$T \perp Y(0), Y(1) | X\,$$

where $$\perp$$ denotes statistical independence.

Rosenbaum and Rubin showed that if unconfoundedness holds, then


 * $$T \perp Y(0), Y(1) | p(X).$$

While it is cognitively impossible to use the definition above for determining whether unconfoundedness holds in any specific situation, Pearl (2000) has shown that a simple graphical criterion called backdoor provides an equivalent definition of unconfoundedness.

Application
There are three commonly used ways to incorporate propensity scores into an analysis of treatment effects: matching, stratification, and regression adjustment. In each of these methods, the propensity score is created in the same manner, but the way in which the score is used varies. One common way to estimate the propensity score is with logistic regression of treatment predicted by clinically relevant or significant baseline covariates. The advantage of using a propensity score in addition to a logistic regression of treatment predicted by covariates is that the propensity score creates a randomized way of comparing the treatment group to the control group. When paired based on propensity, each subject is equally likely (i.e. had the same probability) to receive a given treatment.

Matching
One to one matching can be difficult, especially when there are numerous covariates to match. It is easier to match using propensity scoring because the propensity score is a scalar assigned to each patient that incorporates the effect of all covariates in the model.

The first step is to match a treated subject with a control subject based on their respective propensity scores. Exact matching of scores would be nearly impossible, so a range of values must be determined. According to Rosenbaum and Rubin, a quarter of a standard deviation of the logit of the propensity score is an appropriate range. Once the subjects are paired, pre and post-matching baseline characteristics between means of covariates for the treatment and control groups are compared. If the post-matching comparison of means is more similar than the pre-matching comparison, the propensity matching has reduced the bias of the treatment effect.

Stratification
Another way to consider propensity is through stratification. Using this method, the propensity score is calculated and then divided into groups. Rosenbaum and Rubin suggest that the propensity be stratified into quintiles because this usually eliminates over 90% of the bias in each covariate. Means of the baseline characteristics between treated subjects and controls are compared pre and post stratification. For post stratification comparison of means, an adjustment is made using a categorical variable representing the propensity quintiles. One way to determine an overall treatment effect is to individually model treatment predicted by the propensity score for each quintile and then combine the estimates determined by each quintile. Another way is to model outcome predicted by treatment and either the raw propensity score or the propensity quintile. A subset of covariates can also be included in this model.

Regression Adjustment
Regression Adjustment is also a useful way to incorporate propensity scoring. With this method, a regression of the outcome using a large set of background covariates is performed to obtain the propensity score. Then once the propensity scores are obtained, another regression of the outcome predicted by treatment group and propensity score is used to analyze treatment effect. A subset of important covariates can also be included in this model. Both models, the model with the subset of covariates and the model without the subset, should yield the same conclusions. Stratification and regression adjustment methods can be combined and may produce more accurate results than any one individual method from above.

STATA Code
For the purposes of these examples, the data is entered as one line per subject.

Generating Propensity Score
xi: logistic lungtransplant age sex bmi

predict propensity gen propensity_class=1 if propensity<0.1
 * Note: Now we divide the propensity score into ranges to match on.  To develop the ranges, look at the distribution of the propensity values.

replace propensity_class=2 if 0.1<=propensity & propensity<0.2

replace propensity_class=3 if 0.2<=propensity & propensity<0.3

replace propensity_class=4 if 0.3<=propensity & propensity<0.4

replace propensity_class=5 if 0.4<=propensity & propensity<0.5

replace propensity_class=6 if 0.5<=propensity & propensity<0.6

replace propensity_class=7 if 0.6<=propensity & propensity<0.7

replace propensity_class=8 if 0.7<=propensity & propensity<0.8

replace propensity_class=9 if 0.8<=propensity & propensity<0.9

replace propensity_class=10 if 0.9<=propensity & propensity<=1

save "c:\transplant.dta", replace

use "c:\transplant.dta", clear

keep if lungtransplant==1

sort propensity_class

save "c:\transplant_yes.dta", replace

use "c:\transplant.dta", clear

keep if lungtransplant==0

rename id id_no

rename lungtransplant lungtransplant_no

rename age age_no

rename sex sex_no

rename bmi bmi_no

sort propensity_class

save "c:\transplant_no.dta", replace

merge propensity_class using "c:\transplant_yes.dta"

tab _merge

keep if _merge==3

drop _merge

save "c:\matched_cohort.dta", replace

Now the dataset is arranged as such:

use "c:\matched_cohort.dta", clear

keep id lungtransplant age sex bmi

save "c:\matched_cohort_yes.dta", replace

use "c:\matched_cohort.dta", clear

keep id_no lungtransplant_no age_no sex_no bmi_no

rename id_no id

rename lungtransplant_no lungtransplant

rename age_no age

rename sex_no sex

rename bmi_no bmi

save "c:\matched_cohort_no.dta", replace

append using "c:\matched_cohort_yes.dta"

stset days2death, failure(death)

stcox lungtransplant

Generating Propensity Score
pscore lungtransplant age sex bmi, pscore(mypscore) blockid(myblock)

Incorporating Propensity Score Stratification in the Model
stset days2death, failure(death)

stcox lungtransplant, strata(myblock)

Generating Propensity Score
logistic lungtransplant age sex bmi

predict propensity

Incorporating Propensity Score in the Model
stset days2death, failure(death) stcox lungtransplant propensity
 * Model 1- Death predicted by lung transplant status (0/1) and propensity score*

stcox lungtransplant age sex bmi propensity
 * Model 2- Death predicted by lung transplant status (0/1), propensity score and a set or subset of important covariates*

Additional Resources

 * D'Agostino, R. B. (2007). "Propensity Scores in Cardiovascular Research," Circulation;115;2340-2343.
 * D'Agostino, R. B. (1998). "Tutorial in Biostatistics: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a non-Randomzied Control Group," Statistics in Medicine. 17, 2265-2281.
 * Pearl, J. (2000). Causality: Models, Reasoning, and Inference, Cambridge University Press.
 * Rosenbaum, P. R., and Rubin, D. B., (1983), "The Central Role of the Propensity Score in Observational Studies for Causal Effects," Biometrika 70, 41-55.