Population groups in biomedicine

Biomedical researchers subdivide populations into groups with the goal of improving the prevention and treatment of diseases. Many studies have found that disease susceptibility and environmental responses vary among U.S. ethnicities, among New World peoples with different ratios of African-European-Amerind continent-of-ancestry genetic admixture, and among genetically isolated groups with unusual genes. A problem in interpreting such findings is that researchers and lay public alike often use the words "race" and "ethnicity" as shorthand for each of at least three different ways of classifying populations. How genes affect succeptibility to disease is being studied. But the field suffers from problems in terminology.

Three approaches to grouping populations for biomedical research
First, due to social inequality, different U.S. ethnic groups show different rates of certain diseases. Second, the recent discovery of ancestry-informative genetic markers has enabled investigators to correlate the incidence of certain diseases with actual Afro-Euro-Amerind admixture, thus separating genetic from cultural effects. Third, the decoding of the human genome has enabled more intense study of genetic predisposition to certain diseases beyond the previously known monogenic diseases like sickle-cell and thalassemia.

Differences among U.S. voluntary ethnic self-identity groups
U.S. ethnic groups can exhibit substantial average differences in disease incidence, disease severity, disease progression, and response to treatment. The African American community has a higher rate of mortality than does any other U.S. ethnic group for 8 of the top 10 causes of death. U.S. Latinos have higher rates of death from diabetes, liver disease, and infectious diseases than do non-Latinos. Native Americans suffer from higher rates of diabetes, tuberculosis, pneumonia, influenza, and alcoholism than does the rest of the U.S. population. European Americans die more often from heart disease and cancer than do Native Americans, Asian Americans, or Hispanics.

Considerable evidence indicates that the inter-ethnicity health disparities observed in the United States arise mostly through the effects of discrimination, differences in treatment, poverty, lack of access to health care, health-related behaviors, racism, stress, and other socially mediated forces. The infant mortality rate for African Americans is approximately twice the rate for European Americans, but, in a study that looked at members of these two groups who belonged to the military and received care through the same medical system, their infant mortality rates were essentially equivalent. Recent immigrants to the United States from Mexico have better indicators on some measures of health than do Mexican Americans who are more assimilated into American culture. Diabetes and obesity are more common among Native Americans living on U.S. reservations than among those living outside reservations. Rates of heart disease among African Americans are associated with the segregation patterns in the neighborhoods where they live. Furthermore, the risks for many diseases are elevated for socially, economically, and politically disadvantaged groups in the United States, suggesting that socioeconomic inequities are the root causes of most of the differences.

U.S. ethnic self-identity has also been found to be associated with susceptibility to complex, multifactorial and multigenic diseases. The incidence and death rates of prostate and breast cancers are significantly higher among African Americans than European Americans. African Americans have increased susceptibility to both obesity and abnormal levels of insulin secretion. Likewise, Hispanic, American Indian, African American, Pacific Islander, and South Asian ethnicity are considered a risk factors for diabetes. Also, the incidence of heart disease and high blood pressure is higher in African Americans than European Americans.

Polymorphisms in the regulatory region of the CCR5 gene affect the rate of progression to AIDS and death in HIV infected patients. While some CCR5 haplotypes are beneficial in multiple populations, other haplotypes have population-specific effects. For example, the HHE haplotype of CCR5 is associated with delayed disease progression in those who self-identified as White Americans, but accelerated disease progression in those who self-identified as Black Americans. Similarly, alleles of the CARD15 (also called NOD2) gene are associated with Crohn's disease, an inflammatory bowel disorder, in White Americans. However, none of these or any other alleles of CARD15 have been associated with Crohn's disease in Black Americans or Asian Americans.

No evidence yet found suggests that disease-related differences among U.S. voluntary ethnic self-identity groups also appear in other nations.

Differences associated with continent-of-ancestry genetic admixture ratios
With one exception, populations throughout the New World are a unimodal genetic mix of European, African, and Native American ancestry. It has been found that some diseases vary depending upon such admixture ratios. Examples include hypertension, diabetes, obesity and prostate cancer. However, in none of these cases has Afro-Euro-Amerind admixture rations been shown to account for a significant fraction of the difference in disease prevalence, and the role of genetic factors in generating these differences remains uncertain.

No evidence yet found suggests that disease correlations with New World Afro-Euro-Amerind admixture ratios exist in the Old World.

Differences among genetically isolated groups with unusual genes
Finally, groups that were historically isolated (in a genetic sense) sometimes have unusual gene DNA variants that predispose to monogenic disease. Such diseases include Ellis–van Creveld syndrome among the Pennsylvania Amish, Tay-Sachs disease among Ashkenazi Jewish populations, and geographical hemoglobinopathies (sickle cell anemia, thalassemia) among people with ancestors who lived in malarial regions. The incidence of monogenic diseases differs between such once-isolated groups.

Specific populations are now associated with differential disease susceptibility and environmental responses. Many highly penetrant Mendelian diseases that are caused by mutations in a single gene are known to be found at higher frequencies in certain populations. The HbS allele that causes sickle-cell anemia or haemochromatosis is found at higher frequencies in sub-Saharan Africans and Southern Europeans. Similarly, the ΔF508 allele of CFTR that causes cystic fibrosis is found in higher frequencies in Northern Europeans. It is believed that many of these mutations first occurred in the population that is most affected.

In contrast to diseases associated with voluntary U.S. ethnic self-identity, and in contrast to diseases linked to New World Euro-Afro-Amerind admixture ratios, links between monogenic diseases and delineated populations are universal in that they appear no matter where the once-isolated population now lives.

The allelic architecture of disease
Sometimes the link between a disease and an unusual gene variant is more subtle. The genetic architecture of common diseases is an important factor in determining the extent to which patterns of genetic variation influence group differences in health outcomes. According to the common disease/common variant hypothesis, common variants present in the ancestral population before the dispersal of modern humans from Africa play an important role in human diseases. Genetic variants associated with Alzheimer disease, deep venous thrombosis, Crohn disease, and type 2 diabetes appear to adhere to this model. However, the generality of the model has not yet been established and, in some cases, is in doubt. Some diseases, such as many common cancers, appear not to be well described by the common disease/common variant model.

Another possibility is that common diseases arise in part through the action of combinations of variants that are individually rare. Most of the disease-associated alleles discovered to date have been rare, and rare variants are more likely than common variants to be differentially distributed among groups distinguished by ancestry. However, groups could harbor different, though perhaps overlapping, sets of rare variants, which would reduce contrasts between groups in the incidence of the disease.

The number of variants contributing to a disease and the interactions among those variants also could influence the distribution of diseases among groups. The difficulty that has been encountered in finding contributory alleles for complex diseases and in replicating positive associations suggests that many complex diseases involve numerous variants rather than a moderate number of alleles, and the influence of any given variant may depend in critical ways on the genetic and environmental background. If many alleles are required to increase susceptibility to a disease, the odds are low that the necessary combination of alleles would become concentrated in a particular group purely through drift.

Population substructure in genetics research
One area in which population categories can be important considerations in genetics research is in controlling for confounding between population substructure, environmental exposures, and health outcomes. Association studies can produce spurious results if cases and controls have differing allele frequencies for genes that are not related to the disease being studied, although the magnitude of this problem in genetic association studies is subject to debate. Various methods have been developed to detect and account for population substructure, but these methods can be difficult to apply in practice.

Population substructure also can be used to advantage in genetic association studies. For example, populations that represent recent mixtures of geographically separated ancestral groups can exhibit longer-range linkage disequilibrium between susceptibility alleles and genetic markers than is the case for other populations. Genetic studies can use this admixture linkage disequilibrium to search for disease alleles with fewer markers than would be needed otherwise. Association studies also can take advantage of the contrasting experiences of racial or ethnic groups, including migrant groups, to search for interactions between particular alleles and environmental factors that might influence health.

Disease association studies
The common disease-common variant (often abbreviated CD-CV) hypothesis predicts common disease causing alleles will be found in all populations. An often cited example is an allele of apolipoprotein E, APOE ε4, which is associated in a dose-dependent manner with susceptibility to Alzheimer's disease. This allele is found in Africans and Europeans. However, many disease causing alleles are found to have different (technically called epistatic) effects in different populations.

Terminology problems
The need to subdivide the human species into groups for analytical purposes has existed throughout recorded history. Until about a century ago, such groups were termed "races," a term that soon after its 17th-century coinage became politicized and used to support colonization, exploitation, cruelty and oppression among nations, religions, cultures, and those of different continental ancestries. Consequently, today the conceptual act of merely dividing populations into groups for analysis has become suspect and even controversial. This has led to the euphemization of population terms. The confusion is aggravated by the alignment between the U.S. endogamous color line and Euro-African admixture ratios. And so the controversy has jeopardized the credibility of otherwise useful studies.

The very act of conceptually dividing populations into groups is controversial.
Among physical anthropolgists at least, support for dividing populations by "race" has fallen steadily over the past century. Where 78 percent of the articles in the 1931 Journal of Physical Anthropology employed these or similar synonymous terms reflecting a bio-race paradigm, only 36 percent did so in 1965, and just 28 percent did in 1996. The paradigm has also lost favor among medical researchers and practitioners. In February, 2001, the editors of the medical journal Archives of Pediatrics and Adolescent Medicine asked authors to no longer use "race" as explanatory variable nor to use obsolescent terms. Others prestigious peer-reviewed journals, such as the New England Journal of Medicine and the American Journal of Public Health have done the same. Furthermore, the National Institutes of Health recently issued a program announcement for grant applications through February 1, 2006, specifically seeking researchers who can investigate and publicize among primary care physicians the detrimental effects on the nation's health of the practice of medical racial profiling using such terms The program announcement quoted the editors of one journal as saying that, "analysis by race and ethnicity has become an analytical knee-jerk reflex."

Such criticisms suggest that "racial" terms (Black, White, Asian or Caucasoid, Negroid, Mongoloid) reify the "race" notion and perpetuate the simplistic and demonstrably false notion that H. sapiens can genetically be divided into a specific set of 3-8 distinct groups which can then be objectively delineated to everyone's agreement. Humanity can be grouped or classified in many different ways, of course, either genetically (as, for instance, by blood type, lactose tolerance, skin tone, or the neutral markers of prehistoric migrations) or politically (as in U.S. EEOC regulations). And whether any such classification scheme matches any particular individual's notion of "race" depends upon the individual.

Nevertheless, as mentioned above (Differences among U.S. voluntary ethnic self-identity groups), there are considerable public health disparities among U.S. ethnic groups. Clearly, the subject must be studied in order to bring about a more equitable society, and it is hard to see how to do this without labeling the social groups being studied. Similarly (as mentioned in Differences associated with continent-of-ancestry admixture ratios), increasing numbers of public health differences throughout the Western Hemisphere are found to correlate with Euro-Afro-Amerind genetic admixture ratios in New World populations. Again, one must label the phenomenon in order to study it.

The controversy has led to the euphemization of population terms.
One way of trying to avoid the controversy is to coin new terms for population groups, thereby avoiding the word "race." Such terms as populations, groups, clusters, clines, breeds, varieties, and subspecies have all been employed. And although such euphemization may satisfy the lay reader (for example, see Cline), it does not distract the more serious critic. As R.S. Cooper puts it, "Each time the technical facade of these racialist arguments is destroyed, the latest jargon and half-truths from the margins of science are used to rebuild them around the same core belief in Black inferiority. Because race is in part a genetic concept, the advent of molecular DNA technology has opened an important new chapter in this story. Unfortunately, the article... begins from mistaken premises and merely restates the racialist view using the terminology of molecular genetics."

Ultimately, the utility of any classification scheme depends upon the use to which it is put. The traditional sharp U.S. dichotomy between the Black and White "races" is vital to a federal EEOC enforcement officer or litigator. But the same sharp division is useless to someone studying public health problems in Puerto Rico or Brazil. The split in Europe between the Mediterranean populations and the Nordic populations is vital to anyone studying cystic fibrosis or thalassemia. But the same distinction is meaningless to someone examining U.S. affirmative action policies. The problem is that every "racial" label drags in historical connotations.

Ethnicities and admixture ratios align in the United States.
The controversy is aggravated by the alignment between the U.S. endogamous color line and Euro-African admixture ratios. In most New World nations, a scatter diagram of Euro-African genetic admixture yields a single mode, a cluster of dots stretching from preponderantly European to mostly African. Nations differ in where their scatter diagram's cluster is centered, depending on the particular nation’s colonial ratio of African slaves to European colonists. A plot from Argentina would cluster tightly at the European admixture axis with a tenuous tail of dots stretching into the 50-50 Euro-African region. Scatter diagrams of Puerto Rico or the Dominican Republic center on the 50-percent line, with decreasing dot densities stretching all the way to both of the 100-percent axes. Haiti's Euro-African genetic admixture scatter diagram clusters tightly at the African admixture axis with a tenuous tail of dots stretching into the 50-50 Euro-African region, thus resembling an upside-down picture of Argentina's diagram.

Of all the New World nations that imported hundreds of thousands of African slaves, only the United States has a bimodal Euro-African genetic admixture scatter diagram.. For four centuries, the United States has been astonishingly successful at preserving two distinct genetic populations: one of mostly African ancestry, the other overwhelmingly European. There is little Euro-African genetic overlap between the U.S. White and Black endogamous groups. Two thirds of White Americans have no detectable African ancestry at all (other than the ancient African ancestry shared by all members of our species, of course). Only one-third of White Americans have detectable African DNA (averaging 2.3 percent) from ancestors who passed through the endogamous color line from Black to White. And although most Black Americans have slight European genetic admixture, it averages only about 15 percent.

It is a historical irony of biomedical population studies that the one nation that enjoys the greatest funding (and consequently the most intense study) is also the one nation where the ethnic self-identity of its two largest endogamous groups correlates strongly with actual genetic admixture. Everywhere on earth it is legitimate to ask of a population study, "Is this a study of public health differences between voluntary ethnic self-identity groups or is this a study of differences associated with Euro-African genetic admixture?"

The controversy jeopardizes the credibility of otherwise useful studies.
Even in the United States, however, the correlation between ethnicity and genetic admixture is not perfect. About one-third of White Americans have detectable African DNA markers. And about five percent of Black Americans have no detectable African DNA. And American social mobility is ingrained in the popular culture. Given three Americans, one who self-identifies and is socially accepted as U.S. White (like Carol Channing), another one who self-identifies and is socially accepted as U.S. Black (like Gregory Howard Williams), and one who self-identifies and is socially accepted as U.S. Hispanic (like Geraldo Rivera), and given that they have precisely the same Afro-European mix of ancestries (one "mulatto" grandparent), only an interview will identify their U.S. ethnicities, but only admixture mapping will reveal their genetic traits.

The confusion engendered by vague or ambiguous terminology in published biomedical reports has not gone unnoticed: Finally, rather than merely using vague or easy-to-misunderstand terminology to avoid controversy, some biomedical studies simply refrain from explaining how they determined the "race" (or group classification) of their subjects, whether they measured ethnicity by interview, Euro-African genetic admixture, or monogenic traits of isolated populations. They leave it up to the reader to guess.
 * According to P. Aspinall: "Race equality as a matter of governance has gained momentum in most Western countries and is reflected in race/ethnicity data collection in administrative systems and the attention accorded to terminology by census agencies. However, the vocabulary of health care--both in its literature and the language of officialdom--has proved resistant to the use of this lexicon of acceptable terms.... What makes such language racist is the historical legacy it carries--that is, its symbolic importance."
 * According to James F. Wilson et al.: "We find that commonly used ethnic labels are both insufficient and inaccurate representations of the inferred genetic clusters.... We note, however, that the complexity of human demographic history means that there is no obvious natural clustering scheme, nor an obvious appropriate degree of resolution."
 * According to R. Bhopal et al.: "Although quality research in this field is most welcome, concern is mounting over the confusing and often inappropriate labeling of populations under study."
 * According to P.J. Aspinall: "Given the widespread and often inconsistent use of this terminology in both text and tables, resulting in confusion or ambiguity about the populations being described, it is important that this issue is addressed."
 * According to M.A. Winker: "Medical definitions of race have lagged behind [in the elimination of the imprecise and innaccurate terms Caucasoid, Negroid, and Mongoloid]...."

Other References

 * Editorial. Genes, drugs and race. Nature Genetics 29, 239 - 240 (2001).
 * Farrer, L. A. et al. Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. JAMA 278, 1349-1356 (1997).
 * Hardy, J., Singleton, A. & Gwinn-Hardy, K. Ethnic differences and disease phenotypes. Science 300, 739-740 (2003).
 * Holden, C. Race and medicine. Science 302, 594-596 (2003).
 * Hugot, J. P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599-603 (2001).
 * Inoue, N. Lack of common NOD2 variants in Japanese patients with Crohn's disease. Gastroenterology 123, 86-91 (2002).
 * Martin, M. P. et al. Genetic acceleration of AIDS progression by a promoter variant of CCR5. Science 282, 1907-1911 (1998).
 * Martinson, J. J., Chapman, N. H., Rees, D. C., Liu, Y. T. & Clegg, J. B. Global distribution of the CCR5 gene 32-basepair deletion. Nature Genet. 16, 100-103 (1997).
 * Ogura, Y. et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411, 603-606 (2001).
 * Wiencke, J. K. Impact of race/ethnicity on molecular pathways in human cancer. Nature Rev. Cancer 4, 79-84 (2003).
 * Yancy, C. D. Does race matter in heart failure. Am. Heart J. 146, 203-206 (2003).