Pythagorean expectation

Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team "should" have won based on the number of runs they scored and allowed. Comparing a team's actual and Pythagorean winning percentage can be used to evaluate how "lucky" or "clutch" that team was. The term is derived from the formula's resemblance to the Pythagorean theorem.

The basic formula is:


 * $$\mathrm{Win\%} = \frac{\mathrm{Runs Scored}^2}{\mathrm{Runs Scored}^2 + \mathrm{Runs Allowed}^2} = \frac{1}{1+(\mathrm{Runs Allowed}/\mathrm{Runs Scored})^2}$$

where Win% is the winning percentage generated by the formula. The expected number of wins would be the expected winning percentage multiplied by the number of games played.

Empirical origin
Empirically, this formula correlates fairly well with how baseball teams actually perform, although an exponent of 1.81 is slightly more accurate. This correlation is one justification for using runs as a unit of measurement for player performance. Efforts have been made to find the ideal exponent for the formula, the most widely known being the Pythagenport formula developed by Clay Davenport of Baseball Prospectus (1.5 log((r + ra)/g) + 0.45) and the less well known but equally effective Pythagenpat formula ((r + ra)/g)0.287), developed by David Smyth. Davenport expressed his support for the latter of the two, saying: After further review, I (Clay) have come to the conclusion that the so-called Smyth/Patriot method, aka Pythagenpat, is a better fit. In that, X=((rs+ra)/g)^.285, although there is some wiggle room for disagreement in the exponent. Anyway, that equation is simpler, more elegant, and gets the better answer over a wider range of runs scored than Pythagenport, including the mandatory value of 1 at 1 rpg.

These formulas are only necessary when dealing with extreme situations in which the average amount of runs scored per game is either very high or very low. For most situations, simply squaring each variable yields accurate results.

There are, some systematic statistical deviations between actual winning percentage and expected winning percentage, which include bullpen quality and luck. In addition, the formula tends to regress toward the mean, as teams that win a lot of games tend to be underrepresented by the formula (meaning they "should" have won fewer games), and teams that lose a lot of games tend to be overrepresented (they "should" have won more).

Theoretical explanation
There is no explanation for the correlation between the formula and actual winning percentage in theory, rather the correlation has just been shown to work empirically. However, Professor Steven J. Miller provided a statistical derivation of the formula: if runs for each team follow a Weibull distribution, then the formula gives the probability of winning.

Use in basketball
When noted basketball analyst Dean Oliver applied James' pythagorean theory to his own sport, the result was similar, except for the exponents:


 * $$\mathrm{Win\%} = \frac{\mathrm{Points For}^{14}}{\mathrm{Points For}^{14} + \mathrm{Points Against}^{14}}.$$

Another noted basketball statistician, John Hollinger, uses a similar pythagorean formula except with 16.5 as the exponent.