International HapMap Project

The International HapMap Project is an organization whose goal is to develop a haplotype map of the human genome (the HapMap), which will describe the common patterns of human genetic variation. The project is a collaboration among researchers at academic centers, non-profit biomedical research groups and private companies in Canada, China, Japan, Nigeria, the United Kingdom, and the United States.

The HapMap is expected to be a key resource for researchers to use to find genes affecting health, disease and responses to drugs and environmental factors. The information produced by the project will be made freely available to researchers around the world.

The International HapMap Project officially started with a meeting on October 27 to 29, 2002, and was expected to take about three years. It comprises two phases; the complete data obtained in Phase I were published on October 27, 2005. The analysis of the entire dataset was published in October 2007.

Background
Unlike the rarer, Mendelian diseases, combinations of different genes and the environment play a role in the development and progression of common diseases (such as diabetes, cancer, heart disease, stroke, depression or asthma), or in the individual response to pharmacological agents. Many examples point to the fact that common variants in these genes, that is sequence variants, or single nucleotide polymorphisms (SNPs), present in at least 5 % of individuals in a population, are responsible. Although any two unrelated people share about 99.5% of their DNA sequence, some people may have an A at a particular site on a chromosome while others have a G instead. Each of these two possibilities is called an allele. Each person has two copies of all chromosomes, except the sex chromosomes. For each SNP, the combination of alleles a person has is called a genotype. Genotyping refers to any method that can uncover what genotype a person has at a particular site.

A haplotype is a series of consecutive alleles on a particular chromosome. Haplotypes are broken down every generation by a mechanism called recombination. However, it was observed that haplotypes in a population are longer than expected because recombination occurs preferentially in specific regions, thus creating "recombination hotspots" and "recombination cold spots", better known as haplotype blocks. Because alleles are correlated with each other in a haplotype block, knowing these structures in a population would enable researchers to infer unknown alleles without genotyping all of the SNPs.

Samples used
Haplotypes are generally shared between populations, but their frequency can differ widely. Four populations were selected for inclusion in the HapMap: 30 adult-and-both-parents trios from Ibadan, Nigeria (YRI), 30 trios of U.S. residents of northern and western European ancestry (CEU), 44 unrelated individuals from Tokyo, Japan (JPT) and 45 unrelated Han Chinese individuals from Beijing, China (CHB). Although, the haplotypes revealed from these populations should be useful for studying many other populations, parallel studies are currently examining the usefulness of including additional populations in the project.

All samples have been collected through a community engagement process with appropriate informed consent. The community engagement process was designed to identify and attempt to respond to culturally specific concerns and give participating communities input into the informed consent and sample collection processes.

Scientific strategy
For the Phase I, one common SNP was genotyped every 5,000 bases. Overall, more than one million polymorphic SNPs were genotyped. The genotyping was carried out by 10 centres using five different genotyping technologies. Genotyping quality was assessed by using duplicate or related samples and by having periodic quality checks where centres had to genotype common sets of SNPs.

The Canadian team was led by Thomas J. Hudson at McGill University in Montreal and focused on chromosomes 2 and 4p. The Chinese team was led by Huanming Yang with centres in Beijing, Shanghai and Hong Kong and focused on chromosomes 3, 8p and 21. The Japanese team was led by Yusuke Nakamura at the University of Tokyo and focused on chromosomes 5, 11, 14, 15, 16, 17 and 19. The British team was led by David R. Bentley at the Sanger Institute and focused on chromosomes 1,6, 10, 13 and 20. There were four American genotyping centres: a team led by Mark Chee and Arnold Oliphant located at Illumina Inc. in San Diego (chromosomes 8q, 9, 18q, 22 and X), a team led by David Altshuler at the Broad Institute in Cambridge (chromosomes 4q, 7q, 18p, Y and mitochondrion), a team led by Richard A. Gibbs at the Baylor College of Medicine in Houston (chromosome 12) and a team led by Pui-Yan Kwok at the University of California, San Francisco (chromosome 7p).

To obtain enough SNPs to create the Map, the Consortium had to fund a large re-sequencing project to discover millions of additional SNPs. As a result, by August 2006, there were more than ten million SNPs in the public databases with more than 40% of them that were known to be polymorphic. By comparison, at the start of the project, less than 3 million SNPs were known and no more than 10% of them were known to be polymorphic.

During Phase II more than two million additional SNPs have been genotyped throughout the genome by the company Perlegen Sciences and 500,000 by the company Affymetrix.

Data access
All of the data generated by the project, including SNP frequencies, genotypes and haplotypes, were placed in the public domain at http://www.hapmap.org