Rubin Causal Model

The Rubin Causal Model (RCM) is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes. RCM is named after its originator, Donald Rubin, Professor of Statistics at Harvard University.

Introduction
The Rubin Causal Model is based in the idea of potential outcomes and the assignment mechanism: every unit has different potential outcomes depending on their "assignment" to a condition. For instance, someone may have one income at age 40 if they attend a private college and a different income at age 40 if they attend a public college. To measure the causal effect of going to a public versus a private college, the investigator should look at the outcome for the same individual in both alternative futures. Since it is impossible to see both potential outcomes at once,  one of the potential outcomes is always missing. A randomized experiment works by assigning people randomly to (in this case) public or private college; because the assignment was random, the groups are (on average) equivalent, and the difference in income at age 40 can be attributed to the college assignment since that was the only difference between the groups.

The assignment mechanism is the explanation for why some units received the treatment and others the control. In observational data, there is a non-random assignment mechanism: in the case of college attendance, people may choose to attend a private versus a public college based on their financial situation, parents' education, relative ranks of the schools they were admitted to, etc. If all of these factors can be balanced between the two groups of public and private college students, then the effect of the college attendance can be attributed to the college choice.

Many statistical methods have been developed for causal inference, such as propensity score matching and nearest-neighbor matching (which often uses often uses the Mahalanobis metric, so may also be called Mahalanobis matching). These methods attempt to correct for the assignment mechanism by finding control units similar to treatment units. In the example, matching finds graduates of a public college most similar to graduates of a private college, so that like is compared only with like.

Causal inference methods make few assumptions other than that one unit's outcomes are unaffected by another unit's treatment assignment, the stable unit treatment value assumption (SUTVA).

An extended example
Rubin defines a causal effect:

Intuitively, the causal effect of one treatment, E, over another, C, for a particular unit and an interval of time from $$t_1$$ to $$t_2$$ is the difference between what would have happened at time $$t_2$$ if the unit had been exposed to E initiated at $$t_1$$ and what would have happened at $$t_2$$ if the unit had been exposed to C initiated at $$t_1$$: 'If an hour ago I had taken two aspirins instead of just a glass of water, my headache would now be gone,' or because an hour ago I took two aspirins instead of just a glass of water, my headache is now gone.' Our definition of the causal effect of the E versus C treatment will reflect this intuitive meaning.

According to the RCM, the causal effect of your taking or not taking aspirin one hour ago is the difference between how your head would have felt in case 1 (taking the aspirin) and case 2 (not taking the aspirin). If your headache would remain without aspirin but disappear if you took aspirin, then the causal effect of taking aspirin is headache relief.

Suppose that Joe is participating in an FDA test for a new hypertension drug. If we are omniscient, we can see the outcomes for Joe under both the treatment (drug) and control (placebo) conditions and know the treatment effect.

$$Y_t(u)$$ is the change in Joe's blood pressure if he takes the pill. In general, this notation expresses the effect of a treatment, t, on a unit, u. Similarly, $$Y_c(u)$$ is the effect of a different treatment, c or control, on a unit, u. In this case, $$Y_c(u)$$ is the change in Joe's blood pressure if he doesn't take the pill. $$Y_t(u) - Y_c(u)$$ is the causal effect of taking the drug.

From this table we only know the causal effect on Joe. Everyone else in the study might have an increase in blood pressure. However, regardless of what the causal effect is for the other subjects, the causal effect for Joe is a decrease in blood pressure.

Consider a larger sample of patients:

The causal effect is different for every subject, but the drug works for everyone because everyone's blood pressure decreases.

Stable unit treatment value assumption (SUTVA)
We require “the [potential outcome] observation on one unit should be unaffected by the particular assignment of treatments to the other units” (Cox 1958, §2.4). This is called the Stable Unit Treatment Value Assumption (SUTVA), which goes beyond the concept of independence.

In the context of our example, Joe's change in blood pressure may not depend on whether or not Mary receives the drug. Suppose that Joe and Mary live in the same house. Mary always cooks. If Mary does not take the drug she will not cook salty foods, but if she does take the drug she will cook salty foods. A high salt diet increases Joe's blood pressure. Therefore, his response will depend on which treatment Mary receives. [Note that if it's a blinded trial, Mary doesn't know if she gets the active or placebo drug, so another example is probably better.]

SUTVA violation makes causal inference more difficult. We can account for dependent observations by considering more treatments. We create 4 treatments by taking into account whether or not Mary receives treatment.

Now there are multiple causal effects. One is the causal effect of the drug Joe when Mary receives treatment and is calculated, $$10 - 20$$. Another is the causal effect on Joe when Mary does not receive treatment and is calculated $$0 - 5$$. The third is the causal effect of Mary on Joe, and is calculated $$20 - 5$$. The treatment Mary receives has a greater causal effect for Joe than the assignment of treatment to Joe.

With additional treatments, SUTVA holds. However, if any units other than Joe are dependent on Mary, then we must consider further treatments. The greater the number of dependent units, the more treatments we must consider and the more complex the calculations become (consider an experiment with 20 different causal effects). In order to determine the causal effect using only two treatments, the observations must be independent.

Consider an example where not all subjects benefit from the drug.

One may calculate the average causal effect by taking the mean of all the causal effects or by subtracting the mean change under control from the mean change under treatment. Although the average causal effect is a decrease in blood pressure, the causal effect for Joe is an increase in blood pressure. Joe would never want to take the drug.

How we measure the response affects what inferences we draw. Suppose that we measure changes in blood pressure as a percentage change:

This measurement suggests the opposite conclusion, that the average causal effect is an increase in blood pressure. One obtains this result because the positive change in blood pressure for Joe is a larger percentage of his blood pressure. This would occur if Joe's blood pressure is lower than the blood pressure of the other subjects. For example, Joe's blood pressure is 140 and increase by 14 mm Hg, an increase of 10%. If Mary's blood pressure is 200 mm Hg and her blood pressure increases by 14 mm Hg, then her blood pressure only increases by 7%. Consequently, a small absolute change in blood pressure would yield a larger percentage change for Joe.

The fundamental problem of causal inference
The results we have seen up to this point would never be observed in practice. It is impossible to observe the effect of more than one treatment on a subject at one time. Joe cannot both take the pill and not take the pill at the same time. Therefore, the data would look something like this:

Question marks are responses that could not be observed. Some scholars call the impossibility of observing responses to multiple treatments on the same subject over a given period of time the Fundamental Problem of Causal Inference. The FPCI makes observing causal effects impossible. However, this does not make causal inference impossible. Certain techniques and assumptions allow the FPCI to be overcome.

Suppose that we want to determine the causal effect of the drug on Joe. The FPCI makes it impossible to observe the causal effect so we must determine the average causal effect instead. To do this, we could instruct Joe to repeat the experiment each month for 6 consecutive months. At the beginning of each month, we would flip a coin to determine which treatment he receives. The results of this experiment follow:

Suppose that Joe could only choose to take the drug for all 6 months or not take the drug at all. During one of the months Joe's blood pressure increases when he takes the drug. However, it could have been even higher if he had not taken the drug. Joe would, on average, benefit from the drug because the average causal effect is a decrease in blood pressure. Even if he knew that he would be better off not taking the drug in February, it would most likely be in his overall interest to choose the drug for the entire duration of the study.

In order for us to conclude that the average causal effect of the pill is a decrease in Joe's blood pressure, we must make certain assumptions. Joe's responses must be independent of each other. Joe's response during any month must not be affected by the treatments he receives during any other month. His taking the drug in January should not effect his response to the control in February. If this assumption does not hold, perhaps because the drug remains in the blood stream, we would have to consider multiple treatments. By making each treatment a combination of the treatment Joe received the previous month and the treatment he would receive the following month, we would create 4 treatments:

Using these different treatments would restore independence. However, as responses become dependent on more than one treatment assignment, the number of treatments becomes exponentially greater, and determining average causal effect becomes more complex. In this example, we would have to determine three different causal effects. The first is the causal effect of the drug on Joe when Joe takes the drug the month before. The second is the causal effect of the drug on Joe when Joe does not take the drug the month before. The third is the causal effect of taking the drug on Joe when he does not take the

We can infer what Joe's response to the unobserved treatment would be if we make an assumption of constant effect. This means that the causal effect is the same at different times, no different in March than it is in April. If the causal effect is always the same, then the average causal effect equals the causal effect. Therefore, knowing the average causal effect and observing one response, we can calculate the other response.


 * $$Y_t(u) = T+Y_c(u)$$

and


 * $$Y_t(u) - T = Y_c(u).$$

Since the average causal effect for Joe is a reduction in blood pressure, an assumption of constant effect suggests that the drug would always reduce his blood pressure.

Multiple subjects
Another way to determine average causal effect is to use multiple subjects:

Mary's and Susie's blood pressures increase when they take the drug. We do not know the causal effect of the drug on Susie or Mary because we do not know their responses under control.

If we wanted to infer the unobserved values we could make an assumption of either constant effect or homogeneity, an even stronger assumption than constant effect. If the subjects are all the same or homogeneous, than they would all have the same response to the treatment and the same response to the control. Mathematically, $$Y_{t1}(u) = Y_{t2}(u)$$ $$Y_{c1}(u) = Y_{c2}(u)$$ where 1 and 2 are units being tested for homogeneity. As causal effect equals $$Y_t(u) - Y_c(u)$$, the causal effect would be the same for all of them. The following tables illustrate data that supports assumptions of constant effect, homogeneity, or both:

All of the subjects have the same causal effect even though they have different responses to the treatments. This data supports the assumption of constant effect, but does not support the assumption of homogeneity.

These subjects have the same responses to the treatment and consequently, the same causal effect. This makes them homogeneous. This data supports the assumptions of both constant effect and homogeneity.

If the assumption of homogeneity holds, then the average causal effect equals the causal effect for every unit. Knowing the average causal effect and having observed the response to one treatment for each unit, one can determine the response to the other treatment. One cannot apply this assumption to the data in this example because the responses are different for every subject.

The assignment mechanism
The assignment mechanism, the method by which units are assigned treatment, affects the calculation of the average causal effect. One such assignment mechanism is randomization. For each subject we could flip a coin to determine if she receives treatment. If we wanted five subjects to receive treatment, we could assign treatment to the first five names we pick out of a hat. When we randomly assign treatments we may get different answers.

This is the true average causal effect. Assigning treatments randomly, we calculate another causal effect.

Another random assignment of treatments yields yet another average causal effect.

The average causal effect varies because our sample is small and the responses have a large variance. If the sample were larger and the variance were less, the average causal effect would be closer to the true average causal effect.

Another assignment mechanism might assign the treatment only to men.

Under this assignment mechanism, it is impossible for women to receive treatment and therefore impossible to determine the average causal effect on female subjects. In order to make any inferences of causal effect on a subject, the probability that the subject receive treatment must be greater than 0 and less than 1.

The perfect doctor
Consider the use of the perfect doctor as an assignment mechanism. The perfect doctor knows how each subject will respond to the drug or the control and assigns each subject to the treatment that will most benefit her. The perfect doctor knows this information about a sample of patients:

Based on this knowledge she would make the following treatment assignments:

The perfect doctor distorts the average causal effect by filtering out poor responses to either treatment. As in the case of Luke, the control sometimes works better than the drug. The perfect doctor would assign the control to Luke, and his response would contribute to the positive average of the control. Consequently, Luke's response makes the average causal effect appear more negative, the opposite of the true causal effect on Luke. The perfect doctor's selective assignment of treatments will make the causal effect of the drug appear greater in magnitude.

Matching
Another way to estimate causal effect is matching. The goal of matching is to pair homogeneous or at least similar units. One way to match is to pair units with similar or the same attributes and randomly assign one of the units treatment and the other unit control.

If matched units are homogeneous, then they have the same causal effect. This means that they have the same average causal effect. Therefore, if all units are perfectly matched, the average causal effect equals the causal effect.

Conclusion
The causal effect of a treatment on a single unit at a point in time is the difference between the outcome variable with the treatment and without the treatment. The Fundamental Problem of Causal Inference is that it impossible to observe the causal effect on a single unit. You either take the aspirin now or you don't. As a consequence, assumptions must be made in order to estimate the missing counterfactuals.

Relations to other approaches
Pearl (2000) has shown the equivalence between Rubin Causal Model (RCM) and Structural Equation Model (SEM) used in econometrics and the social sciences. The equivalence rests on defining the "potential outcome" variable Yx(u) to be the solution for variable Y, under the conditions that (1) the exogeneous variables U assume the values u and (2) the equation that determines the value of X is replaced by the constant equation X = x. With this interpretation, every theorem in RCM is a theorem in SEM and vice versa. This equivalence has led to a complete axiomatization of RCM and a complete solution to the identification of causal effects, using graphs (Shpitser-Pearl 2006). Moreover, the assumptions that are normally needed for inference in RCM can be read directly from the graphical representation of SEM.