Ecological fallacy

The ecological fallacy is a widely recognized error in the interpretation of statistical data, whereby inferences about the nature of individuals are based solely upon aggregate statistics collected for the group to which those individuals belong. This fallacy assumes that all members of a group exhibit characteristics of the group at large. Stereotypes are one form of ecological fallacy. In statistical terminology the equivalent term used is biased sampling and is avoided by using methods such as stratified sampling.

Example 1. A study is done that shows people from Springfield score higher on the SATs, on average, than people from Shelbyville. Making an assumption that a randomly selected individual from Springfield would have scored higher on the SATs than a randomly selected individual from Shelbyville would be an ecological fallacy. Since the SAT scores given in the study were an average, it is indeed possible that the individual from Springfield scored in the bottom ten percent on the SATs and the individual from Shelbyville just happened to score in the top ten percent.

Example 2. Imagine two communities, Chiptown and Pittsville. Within each community there is a divide between the rich and poor, the rich living in gated communities on the hills and the poor living adjacent to the industrial districts that pump carcinogens into their backyards. In both communities, the poor people have a cancer incidence that is many times that of the wealthy people. In Chiptown, where the dominant industry is high-tech computer manufacturing, the overall salaries are higher for both rich and poor people, but the carcinogens spewed into the environment are particularly nasty, giving cancer to nearly all those exposed (almost entirely poor people). Prof. Newbie comes along and decides to examine the risk factors for cancer. He looks up the cancer rates and median incomes of these two towns on the CDC and U.S. Census webpages. He finds, to everyone's surprise, that the cancer incidence is higher in the wealthier community, Chiptown. He concludes that higher income is a risk factor for cancer. In fact, we know that exactly the opposite is true: In the wealthier community of Chiptown, being poor is especially dangerous to one's health.

The ecological fallacy was a factor in the judge's decision to uphold the election of Christine Gregoire in the court challenge to the Washington gubernatorial election, 2004. The challengers had attempted to argue that illegal votes cast in the election would have followed the voting patterns of the precincts in which they had been cast, which they contended would have favored Gregoire. The judge determined that this constituted an ecological fallacy, and disallowed the evidence. An expert witness for Gregoire explained the ecological fallacy as trying to figure out Ichiro Suzuki's batting average by looking at the batting average of the entire Seattle Mariners team.

The ecological fallacy is exceptionally common in population research.

The reverse ecological fallacy is when individual measurements are generalized to a group. For example, when meeting someone from another culture that one has not encountered before, the idea that all people from that culture behave in the way that that particular individual does.

Origin
The term comes from a 1950 paper by William S. Robinson. For each of the 48 states in the US as of the 1930 census, he computed the literacy rate and the proportion of the population born outside the US. He showed that these two figures were associated with a positive correlation of 0.53 — in other words, the greater the proportion of immigrants in a state, the higher its average literacy. However, when individuals are considered, the correlation was &minus;0.11 — immigrants were on average less literate than native citizens. Robinson showed that the positive correlation at the level of state populations was because immigrants tended to settle in states where the native population was more literate. He cautioned against deducing conclusions about individuals on the basis of population-level, or "ecological" data.