Ascertainment bias

In scientific research, ascertainment bias occurs when false results are produced by non-random sampling and conclusions made about an entire group are based on a distorted or nontypical sample. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling. It is one of the most common reasons that researchers in the medical, social, or biological sciences may discover an association or correlation that does not actually exist. Ascertainment bias may be easy to recognize or difficult to detect.

Examples
For example, to find the male/female ratio in a country it is not necessary to count everyone in the country, but selection of a statistical sample of the population will be adequate. The way the sample is selected can influence the result. For example, if the residents of a housing project for elderly persons was counted, the result could be biased in favor of females, who statistically live longer than males.

A simple classroom demonstration of ascertainment bias is to estimate the primary sex ratio (which we know to be 50:50) by asking all females students to report the ratio in their own families, and comparing the result with the same question asked of male students. The women will collectively report a higher ratio of women, since the survey method ensures that every family reported has at least one female child, and is biased by families with only a single, female child (themselves). The men will report a higher ratio of men, for the complimentary reason.

Ascertainment bias is important in studying the genetics of medical conditions, since data are typically collected by physicians in a clinical setting. The results may be skewed because the sample is of patients who have seen a doctor, rather than a random sample of the population as a whole. Berkson's paradox illustrates this effect.

Often, proper design of experiments can minimize this effect. Another way to deal with this effect is to take the non-random sampling into account when analyzing results.

Pedigree studies


Geneticists are limited in how they can obtain data from human populations. As an example, consider a human characteristic. We are interested in deciding if the characteristic is inherited as a simple Mendelian trait. Following the laws of Mendelian inheritance, if the parents in a family do not have the characteristic, but carry the allele for it they are carriers (heterozygous). In this case their children will each have a 1/4 chance of showing the characteristic. The problem arises because we can't tell which families have both parents as carriers (heterozygous) unless they have a child who exhibits the characteristic. The description follows the textbook by Sutton.

The figure shows the pedigrees of all the possible families with two children when the parents are carriers (Aa). If we were able to discover all such families, the chances for each of the pedigrees are listed under "nontruncate selection". In general, though, we cannot discover those families without any affected children. This situation is called "truncate selection", since the families without affected children are truncated from the study. If every family with an affected child has an equal chance of being selected for the study, then the situation is called complete truncate selection. If individuals are selected for the study, then the families with two affected children are more likely to be chosen than families with one, which is called single truncate selection. The probabilities of each of the families being selected is given in the figure, with the sample frequency of affected children also given. In this simple case, the researcher will look for a frequency of 4/7 or 5/8 for the characteristic, depending on the type of truncate selection used.

In more general situations, the methods of analysis go back to the 1930s with studies by JBS Haldane and RA Fisher.

The Caveman effect
Cave paintings made nearly 40,000 years ago have been rediscovered. If there had been contemporary paintings on trees, animal skins or hillsides, they would have been washed away long ago. Similarly, evidence of fire pits, middens, burial sites, etc are most likely to remain intact to the modern era in caves. Prehistoric people are associated with caves because that is where the data still exists, not necessarily because most of them lived in caves for most of their lives.

Clinical bias
The study of medical conditions begins with anecdotal reports. By their nature, such reports only include those referred for diagnosis and treatment. A child who can't function in school is more likely to be diagnosed with dyslexia than a child who struggles but passes. A child examined for one condition is more likely to be tested for and diagnosed with other conditions, skewing comorbidity statistics. As certain diagnoses become associated with behavior problems or mental retardation, parents try to prevent their children from being stigmatized with those diagnoses, introducing further bias. Studies carefully selected from whole populations are showing many conditions are much more common and usually much milder than formerly believed.