Metagenomics

Metagenomics (also Environmental Genomics, Ecogenomics or Community Genomics) is the study of genetic material recovered directly from environmental samples. Traditional microbiology and microbial genome sequencing rely upon cultivated clonal cultures. This relatively new field of genetic research enables studies of organisms that are not easily cultured in a laboratory as well as studies of organisms in their natural environment.

Early environmental gene sequencing cloned specific genes (often the 16s rRNA gene) to produce a profile of diversity in a natural sample. Such work revealed that the vast majority of microbial diversity had been missed by cultivaton based methods. Recent studies use "shotgun" Sanger sequencing or chip-based pyrosequencing to get (mostly) unbiased samples of all genes from all members of sampled communities.

Origin of the term
The term "metagenomics" was first used by Jo Handelsman and others in the University of Wisconsin Department of Plant Pathology, and first appeared in publication in 1998. The term metagenome referenced the idea that a collection of genes sequenced from the environment could be analyzed in way analogous to the study of a single genome. The exploding interest in environmental genetics, along with the buzzword-like nature of the term, has resulted in the broader use of metagenomics to describe any sequencing of genetic material from environmental (i.e. uncultured) samples, even work that focuses on one organism or gene. Recently, Kevin Chen and Lior Pachter (researchers at the University of California, Berkeley) defined metagenomics as "the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species."

Environmental gene surveys
Conventional sequencing begins with a culture of identical cells as a source of DNA. However early metagenomic studies revealed that there are probably large groups of microorganisms in many environments that cannot be cultured and thus cannot be sequenced. These early studies focused on 16S ribosomal RNA sequences which are relatively short, often conserved within a species, and generally different between species. Many 16S rRNA sequences have been found which do not belong to any known cultured species, indicating that there are numerous unisolated organisms out there.

Early molecular work in the field was conducted by Norman R. Pace and colleagues, who used PCR to explore the diversity of ribosomal RNA sequences. The insights gained from these breakthrough studies led Pace to propose the idea of cloning DNA directly from environmental samples as early as 1985. This led to the first report of isolating and cloning bulk DNA from an environmental sample, published by Pace and colleagues in 1991 while Pace was in the Department of Biology at Indiana University. Considerable efforts ensured that these were not PCR false positives and supported the existence of a complex community of unexplored species. Although this methodology was limited to exploring highly conserved, non-protein coding genes, it did support early microbial morphology-based observations that diversity was far more complex than was known by culturing methods.

Soon after that, Healy reported the metagenomic isolation of functional genes from "zoolibraries" constructed from a complex culture of environmental organisms grown in the laboratory on dried grasses in 1995. After leaving the Pace laboratory, Ed DeLong continued in the field and has published work that has largely laid the groundwork for environmental phylogenies based on signature 16S sequences, beginning with his group's construction of libraries from marine samples.

Longer sequences from environmental samples
Recovery of DNA sequences longer than a few thousand base pairs from environmental samples was very difficult until recent advances in molecular biological techniques, particularly related to constructing libraries in bacterial artificial chromosomes (BACs), provided better vectors for molecular cloning.

Shotgun metagenomics
Advances in bioinformatics, refinements of DNA amplification, and proliferation of computational power have greatly aided the analysis of DNA sequences recovered from environmental samples. These advances have enabled the adaptation of shotgun sequencing to metagenomic samples. The approach, used to sequence many cultured microorganisms as well as the human genome, randomly shears DNA, sequences many short sequences, and reconstructs them into a consensus sequence. In 2002, Mya Breitbart, Forest Rohwer, and colleagues used environmental shotgun sequencing to show that 200 liters of seawater contains over 5000 different viruses. Subsequent studies showed that there are >1000 viral species in human stool and possibly a million different viruses per kilogram of marine sediment, including many bacteriophages. Essentially all of the viruses in these studies were new species. A 2004 metagenomic study of the Sargasso Sea found DNA from nearly 2000 different species including 148 types of bacteria never seen before.

Also in 2004, Gene Tyson, Jill Banfield, and colleagues at the University of California, Berkeley and the Joint Genome Institute sequenced DNA extracted from an acid mine drainage system. This effort resulted in the complete, or nearly complete, genomes for a handful of bacteria and archaea that had previously resisted attempts to culture them. It was now possible to study entire genomes without the biases associated with laboratory cultures.

In 2006 Robert Edwards, Forest Rohwer, and colleagues at San Diego State University published the first sequences of environmental samples generated with so-called next generation sequencing, in this case chip based pyrosequencing developed by 454 Life Sciences. This technique for sequencing the DNA generates shorter fragments than the conventional techniques, however this limitation is compensated for by the very large number of sequences generated. In addition, this technique does not require cloning the DNA before sequencing, removing one of the main biases in metagenomics.

Microbial Diversity
Much of the interest in metagenomics comes from the discovery that the vast majority of microorganisms had previously gone unnoticed. Tradition microbiological methods relied upon laboratory cultures of organisms. Surveys of rRNA genes taken directly from the environment revealed that cultivation based methods find less than 1% of the bacteria and archaea species in a sample.

Gene Surveys
Shotgun sequencing and screens of clone libraries reveal genes present in environmental samples. This provides information both on which organisms are present and what metabolic processes are possible in the community. This can be helpful in understanding the ecology of a community, particularly if multiple samples are compared to each other.

Environmental Genomes
Shotgun metagenomics also is capable of sequencing nearly complete microbial genomes directly from the environment. Because the collection of DNA from an environment is largely uncontrolled, the most abundant organisms types in a sample are most highly represented in the resulting sequence data. To achieve the high coverage needed to fully resolve the genomes of underrepresented community members, large samples, often prohibitively so, are needed. On the other hand, the random nature of shotgun sequencing ensures that many of these organisms will be represented by at least some small sequence segments. Due to the limitations of microbial isolation methods, the vast majority of these organisms would go unnoticed using traditional culturing techniques.

Community metabolism
Many bacterial communities show significant division of labor in metabolism. Waste products of some organisms are metabolites for others. Working together they turn raw resources into fully metabolized waste. Using comparative gene studies and expression experiments with microarrays or proteomics researchers can piece together a metabolic network that goes beyond species boundaries. Such studies require detailed knowledge about which versions of which proteins are coded by which species and even by which strains of which species. Therefore, community genomic information is another fundamental part (as metabolomics or proteomics) to be able to estimate how metabolites are possibly transfered and transformed through a community.

Review articles

 * Eisen, J. A. (2007). Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biology 5(3): e82
 * Green, B. D. & Keller, M. (2006). Capturing the uncultivated majority. Current Opinion in Biotechnology 17[3], 236-240.
 * Handelsman J. (2004). Metagenomics: application of genomics to uncultured microorganisms.  Microbiology and Molecular Biology Reviews 68:669-685.
 * Keller, M. & Sengler, K. (2004). Tapping into microbial diversity. Nature Reviews Microbiology 2[2], 141-150.
 * Lombard, N. et al. (2006). The metagenomics of microbial communities. Biofutur 24-7.
 * Riesenfeld, C. S. et al. (2004). Metagenomics: genomic analysis of microbial communities. Annu Rev Genet 38: 525-52.
 * Rodriguez Valera, F. (2002). Approaches to prokaryotic biodiversity: a population genetics perspective. Environmental Microbiology 4: 628-33.
 * Rodriguez-Valera. (2004).  Environmental genomics, the big picture?.  FEMS Microbiology Letters 231:153-158.
 * Torsvik, V. & Ovreas, L. (2002). Microbial diversity and function in soil: from genes to ecosystems. Current opinion in Microbiology 5: 240-5.
 * Whitaker, R. J. & Banfield, J. F. (2006). Population genomics in natural microbial communities. Trends in Ecology & Evolution 21: 508-16.
 * Worden, A. Z. et al. (2006). In-depth analyses of marine microbial community genomics. Trends in Microbiology 14: 331-6.
 * Xu, J. P. (2006). Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances. Molecular Ecology 15: 1713-31.

Methods

 * Beja, O. et al. (2000). Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environmental Microbiology 2: 516-29.
 * Sebat, J. L. et al. (2003). Metagenomic profiling: Microarray analysis of an environmental genomic library. Applied and Environmental Microbiology 69: 4927-34.
 * Suzuki, M. T. et al. (2004). Phylogenetic screening of ribosomal RNA gene-containing clones in bacterial artificial chromosome (BAC) libraries from different depths in Monterey Bay. Microbial Ecology 48: 473-88.

Bioinformatics

 * Tress, M. L. et al. (2006). An analysis of the Sargasso Sea resource and the consequences for database composition. Bmc Bioinformatics 7
 * Foerstner KU, von Mering C, Hooper SD, Bork P (2005) Environments shape the nucleotide composition of genomes. EMBO Rep. 6(12): 1208-13
 * Raes, J., Korbel, J.O., Lercher, M.J., Von Mering, C. & Bork, P. (2007) Prediction of effective genome size in metagenomic samples. Genome Biology 8, R10
 * von Mering, C., Hugenholtz, P., Raes, J., Tringe, S.G., Doerks, T., Jensen, L.J., Ward N. & Bork, P. (2007) Quantitative phylogenetic assessment of microbial communities in diverse environments. Science 315, 1126-1130
 * Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, Lapidus A, Grigoriev I, Richardson P, Hugenholtz P, Kyrpides NC. (2007) Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 4(6):495-500
 * Markowitz VM, Ivanova N, Palaniappan K, Szeto E, Korzeniewski F, Lykidis A, Anderson I, Mavromatis K, Kunin V, Garcia Martin H, Dubchak I, Hugenholtz P, Kyrpides NC. (2006) An experimental metagenome data management and analysis system. Bioinformatics. 22(14):e359-67

Marine ecosystems

 * Angly, F. E. et al. (2006). The marine viromes of four oceanic regions. PloS Biology 4: 2121-31.
 * Beja, O. et al. (2000). Bacterial rhodopsin: Evidence for a new type of phototrophy in the sea. Science 289: 1902-6.
 * Beja, O. et al. (2001). Proteorhodopsin phototrophy in the ocean. Nature 411: 786-9.
 * Beja, O. et al. (2002). Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature 415: 630-3.
 * Culley, A. I. et al. (2006). Metagenomic analysis of coastal RNA virus communities. Science 312: 1795-8.
 * DeLong, E. F. et al. (2006). Community genomics among stratified microbial assemblages in the ocean's interior. Science 311: 496-503.
 * Hallam, S. J. et al. (2006). Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum. Proceedings of the National Academy of Sciences of the United States of America 103: 18296-301.
 * John, D. E. et al. (2006). Gene diversity and organization in rbcL-containing genome fragments from uncultivated Synechococcus in the Gulf of Mexico. Marine Ecology-Progress Series 316: 23-33.
 * Kannan N. et al. (2007). Structural and Functional Diversity of the Microbibial Kinome. PloS Biology 5: 467-478
 * Rusch D. B. et al. (2007). The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PloS Biology 5: 398-431
 * Tringe, S. G. et al. (2005). Comparative metagenomics of microbial communities. Science 308: 554-7.
 * Woyke, T. et al. (2006). Symbiosis insights through metagenomic analysis of a microbial consortium. Nature 443: 950-5.
 * Yooseph S. et al. (2007). The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families. 'PloS Biology'' 5: 432-466
 * Yutin, N. & Beja, O. (2005). Putative novel photosynthetic reaction centre organizations in marine aerobic anoxygenic photosynthetic bacteria: insights from metagenomics and environmental genomics. Environmental Microbiology 7: 2027-33.

Sediments

 * Abulencia, C. B., Wyborski, D. L., Garcia, J. A., Podar, M., Chen, W., Chang, S. H. et al. (2006). Environmental whole-genome amplification to access microbial populations in contaminated sediments. Applied and Environmental Microbiology 72[5], 3291-3301.
 * Breitbart et al. (2004).  Diversity and population structure of a nearshore marine sediment viral community. Proceedings of the Royal Society B 271: 565-574.

Extreme environments

 * Baker, B. J. et al. (2006). Lineages of acidophilic archaea revealed by community genomic analysis. Science 314: 1933-5.

Medical Sciences and biotechnological applications

 * Breitbart et al. (2003).  Metagenomic analyses of an uncultured viral community from human feces.  Journal of Bacteriology 185:6220-6223.
 * Gill, S. R. et al. (2006). Metagenomic analysis of the human distal gut microbiome. Science 312: 1355-9.
 * Mathur, E., Toledo, G., Green, B. D., Podar, M., Richardson, T. H., Kulwiec (2005). A biodiversity-based approach to development of performance enzymes: Applied metagenomics and directed evolution. Industrial Biotechnology, 1, 283-287.
 * Schloss, P. D. & Handelsman, J. (2003). Biotechnological prospects from metagenomics. Current Opinion in Biotechnology 14: 303-10.
 * Zengler, K., Paradkar, A., & Keller, M. (2005). New methods to access microbial diversity for small molecule discovery. Natural Products, 275-293.