Regression toward the mean

Regression toward the mean refers to the fact that those with extreme scores on any measure at one point in time will, for purely statistical reasons, probably have less extreme scores the next time they are tested. Scores always involve a little bit of luck. Many extreme scores include a bit of luck that happened to fall with or against you depending on whether your extreme score is extremely high or extremely low.

Consider an extreme example: a class of students takes a 100-item true/false test on a subject on which none of the students knows anything at all. Therefore, all students choose randomly on all questions leading to a mean score of about 50. Naturally, some students will score substantially above 50 and some substantially below 50 just by chance. If one takes only the top 10% of the students and gives them a parallel form of test on which they again guess on all items, the mean score would be expected to be close to 50. Thus the mean of these students would "regress" all the way back to the mean of all students who took the original test. No matter what a student scores on the original test, the best prediction of their score on the parallel form is 50.

If there were no luck on the test then all students would score the same on the parallel form as they scored on original test, and  there would be no regression toward the mean.

Real situations fall between these two extremes: scores are a combination of skill and luck. If you choose a subset of people who score above the mean, they will be (on average) above the mean on skill and above the mean on luck. On a retest their previously above-average luck will revert to about average. They will therefore score above the mean due to their above-average skill, but not by as much as they did the first time because they will not be as lucky as they were the first time.

History
The clearest early example of regression towards the mean was perhaps Galton's diagram in his article in Journal of the Anthropological Institute, vol. 15 (1886), p.248). It concerned the "Rate of Regression in Hereditary Stature", and compared children's height with their mid-parent height. (All heights are expressed as deviations from the median.) He summarises this results by putting in the diagram: and which is as good a summary of the "regression effect" as we are likely to get.
 * When mid-parents are taller than mediocrity (by which he means the median), their children tend to be shorter than they
 * When mid-parents are shorter than mediocrity, their children tend to be taller than they

Galton also drew the first regression line. This was a plot of seed weights and was presented at a Royal Institution lecture in 1877. Galton had seven sets of sweet pea seeds labeled K to Q and in each packet the seeds were of the same weight. He chose sweet peas on the advice of his cousin Charles Darwin and the botanist Joseph Dalton Hooker as sweet peas tend not to self fertilise and the seed weight varies little with humidity. He distributed these packets to a group of friends throughout Great Britain who planted them. At the end of the growing season the plants were uprooted and returned to Galton. The seeds were distributed because when Galton had tried this experiment himself in the Kew Gardens in 1874, the crop had failed.

He found that the weights of the offspring seeds were normally distributed, like their parents, and that if he plotted the mean diameter of the offspring seeds against the mean diameter of their parents he could draw a straight line through the points — the first regression line. He also found on this plot that the mean size of the offspring seeds tended to the overall mean size. He initially referred to the slope of this line as the "coefficient of reversion". Once he discovered that this effect was not a heritable property but the result of his manipulations of the data, he changed the name to the "coefficient of regression". This result was important because it appeared to conflict with the current thinking on evolution and natural selection. He went to do extensive work in quantitative genetics and in 1888 coined the term "co-relation" and used the now familiar symbol "r" for this value.

In additional work he investigated geniuses in various fields and noted that their children, while typically gifted, were almost invariably closer to the average than their exceptional parents. He later described the same effect more numerically by comparing fathers' heights to their sons' heights. Again, the heights of sons both of unusually tall fathers and of unusually short fathers was typically closer to the mean height than their fathers' heights.

Ubiquity
It is important to realize that regression toward the mean is unrelated to the progression of time, it is an artifact of choosing a non-representative sample in one variable and then examining another variable. In Galton's example, the fathers of exceptionally tall sons also tend to be closer to the mean than their sons.

The original version of regression toward the mean suggests an identical trait with two correlated measurements with the same reliability. However, this character is not necessary, unless any pair of predicting and predicted variables had to be viewed with an identical potential trait. The necessary implicate presumption is that the standard deviations of the predicting and the predicted are the same to be comparable, or have been transformed or interpreted to be comparable.

One later version of regression toward the mean defines a predicting variable with measurement error which impairs the predicting coefficient. This interpretation is not necessary. For example, in the original case the measurement error of length could be ignored.

Another way to understand regression towards the mean is the phrase "nowhere to go but up/down": If you look at any exceptional group, some related group will likely be less exceptional due to the effect of regression towards the mean. For example, if you looked at the families with below-average incomes today, and compared them to how they were doing 5 years ago, you would see that they were likely doing better earlier and conclude that things are getting worse,  Also if you looked at the families with the below-average incomes 5 years ago, you would likely find that they are doing better now, and conclude that things are getting better. This seeming contradiction is an artifact of the regression towards the mean. Since below average families can only remain in the same below average group or change to the above average group, the effect of that change will be to pull the statistic towards the mean. Using a non-representative sample biases the results away from the mean.

Mathematical derivation
If random variables $$X$$ and $$Y$$ have standard deviations $$S_X$$ and $$S_Y$$ and correlation coefficient $$\rho$$ the slope of the regression line is given by $$\rho \frac{S_Y}{S_X}$$

A consequence of this is that a change of 1 standard deviation in $$X$$ is associated with a change of $$\rho$$ SDs in $$Y$$. Unless $$X$$ and $$Y$$ are perfectly correlated $$\rho$$ will be less than unity. Thus, for a given value of $$X$$ the value of $$Y$$ that would be predicted by the regression line is always fewer SDs from its mean than $$X$$ is from its mean. Regression to the mean will occur if $$|\rho| < 1$$, so in practice it always occurs.

Note that regression toward the mean is more pronounced the less the two variables are correlated, i.e. the smaller $$|\rho|$$ is.

The phenomenon of regression toward the mean is related to Stein's example.

An example is from the heritability of IQ (or height or weight) or any other symmetric, bell curved measure depending on multiple factors, some of which are inherited. If 80% of the measure's variation is due to heredity, an individual's estimated, expected value as a percentile is .8(average of the parents' percentiles) + .2(population mean percentile) or


 * .4(father's percentile) + .4(mother's percentile) + 10.

Regression fallacies
Misunderstandings of the principle (known as "regression fallacies") have repeatedly led to mistaken claims in the scientific literature.

An extreme example is Horace Secrist's 1933 book The Triumph of Mediocrity in Business, in which the statistics professor collected mountains of data to prove that the profit rates of competitive businesses tend toward the average over time. In fact, there is no such effect; the variability of profit rates is almost constant over time. Secrist had only described the common regression toward the mean. One exasperated reviewer likened the book to "proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals".

A different regression fallacy occurs in the following example. We want to test whether a certain stress-reducing drug increases reading skills of poor readers. Pupils are given a reading test. The lowest 10% scorers are then given the drug, and tested again, with a different test that also measures reading skill. We find that the average reading score of our group has improved significantly. This however does not show anything about the effectiveness of the drug: even without the drug, the principle of regression toward the mean would have predicted the same outcome. (The solution is to introduce a control group, compare results between the group to which drugs were administered and the control group, and make no comparisons with the original population. This removes the bias between the groups compared.)

The calculation and interpretation of "improvement scores" on standardized educational tests in Massachusetts probably provides another example of the regression fallacy. In 1999, schools were given improvement goals. For each school, the Department of Education tabulated the difference in the average score achieved by students in 1999 and in 2000. It was quickly noted that most of the worst-performing schools had met their goals, which the Department of Education took as confirmation of the soundness of their policies. However, it was also noted that many of the supposedly best schools in the Commonwealth, such as Brookline High School (with 18 National Merit Scholarship finalists) were declared to have failed. As in many cases involving statistics and public policy, the issue is debated, but "improvement scores" were not announced in subsequent years and the findings appear to be a case of regression to the mean.

The psychologist Daniel Kahneman referred to regression to the mean in his speech when he won the 2002 Bank of Sweden prize for economics.

I had the most satisfying Eureka experience of my career while attempting to teach flight instructors that praise is more effective than punishment for promoting skill-learning. When I had finished my enthusiastic speech, one of the most seasoned instructors in the audience raised his hand and made his own short speech, which began by conceding that positive reinforcement might be good for the birds, but went on to deny that it was optimal for flight cadets. He said, "On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don't tell us that reinforcement works and punishment does not, because the opposite is the case." This was a joyous moment, in which I understood an important truth about the world: because we tend to reward others when they do well and punish them when they do badly, and because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them. I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback. We measured the distances from the target and could see that those who had done best the first time had mostly deteriorated on their second try, and vice versa. But I knew that this demonstration would not undo the effects of lifelong exposure to a perverse contingency.

In sports
Statistical analysts have long recognized the effect of regression to the mean in sports; they even have a special name for it: the "Sophomore Slump." For example, Carmelo Anthony of the NBA's Denver Nuggets had an outstanding rookie season in 2004. It was so outstanding, in fact, that he couldn't possibly be expected to repeat it: in 2005, Anthony's numbers had slightly dropped from his torrid rookie season. The reasons for the "sophomore slump" abound, as sports are all about adjustment and counter-adjustment, but luck-based excellence as a rookie is as good a reason as any. Of course, not just "sophomores" experience regression to the mean. Any athlete who posts a significant outlier, whether as a rookie (young players are universally not as good as those in their prime seasons), or particularly after their prime years (for most sports, the mid to late twenties), can be expected to perform more in line with their established standards of performance. The trick for sports executives, then, is to determine whether or not a player's play in the previous season was indeed an outlier, or if the player has established a new level of play. However, this is not easy. Melvin Mora of the Baltimore Orioles put up a season in 2003, at age 31, that was so far away from his performance in prior seasons that analysts assumed it had to be an outlier... but in 2004, Mora was even better. Mora, then, had truly established a new level of production, though he will likely regress to his more reasonable 2003 numbers in 2005. Conversely, Kurt Thomas of the New York Knicks significantly ramped up his production in 2001, at an age (29) when players typically start to play more poorly. Sure enough, in the following season Thomas was his old self again, having regressed to the mean of his established level of play. John Hollinger has an alternate name for the law of regression to the mean: the "fluke rule," while Bill James calls it the "Plexiglass Principle." Whatever you call it, though, regression to the mean is a fact of life, and also of sports.

Regression to the mean in sports performance produced the "Sports Illustrated Cover Jinx" superstition, in all probability. Athletes believe that being on the cover of Sports Illustrated jinxes their future performance, where this apparent jinx was an artifact of regression. A similar effect is perceived in the "Madden Curse", where athletes featured in the Madden NFL video game have suffered subsequent setbacks and declines.