Protein Structure Initiative

The Protein Structure Initiative (PSI) is a 10-year, $764 million effort begun in 2000 to accelerate discovery in structural genomics. Funded by the U.S. National Institute of General Medical Sciences (NIGMS), its aim is to reduce the cost and time required to determine three-dimensional protein structures. Over a dozen research centers are supported in the PSI for work in building and maintaining high-throughput structural genomics pipelines, developing computational protein structure prediction methods, and organizing and disseminating information generated by the PSI.

The project has been organized into two separate phases. The first phase of the Protein Structure Initiative (PSI-1) spanned from 2000 to 2005, and was dedicated to demonstrating the feasibility of high-throughput structure determination, solving unique protein structures, and preparing for a subsequent production phase. The second phase, PSI-2, has focused on implementing the high-throughput structure determination methods developed in PSI-1, as well as homology modeling and addressing bottlenecks like modeling membrane proteins.

Phase 1
The first phase of the Protein Structure Initiative (PSI-1) lasted from June 2000 until September 2005, and had a budget of $270 million funded primarily by NIGMS with support from the National Institute of Allergy and Infectious Diseases. PSI-1 saw the establishment of nine pilot centers focusing on structural genomics studies of a range of organisms, including Arabidopsis thaliana, Caenorhabditis elegans and Mycobacterium tuberculosis. During this five-year period over 1,100 protein structures were determined, over 700 of which were classified as "unique" due to their < 30% sequence similarity with other known protein structures.

The primary goal of PSI-1, to develop methods to streamline the structure determination process, resulted in an array of technical advances. Several methods developed during PSI-1 enhanced expression of recombinant proteins in systems like Escherichia coli, Pichia pastoris and insect cell lines. New streamlined approaches to cell cloning, expression and protein purification were also introduced, in which robotics and software platforms were integrated into the protein production pipeline to minimize required manpower, increase speed, and lower costs.

Phase 2
The goal of the second phase of the Protein Structure Initiative (PSI-2) is to use methods introduced in PSI-1 to determine a large number of proteins and continue development in streamlining the structural genomics pipeline. Scheduled to operate between July 2005 and June 2010, PSI-2 has a five-year budget of $325 million provided by NIGMS with support from the National Center for Research Resources. As of November 1, 2008, the Protein Structure Initiative has solved over 3400 protein structures; over 1,900 of these are unique.

The number of sponsored research centers grew to 14 during PSI-2. The new centers participating in PSI-2 include four specialized centers: Accelerated Technologies Center for Gene to 3D Structure, the Center for High-Throughput Structural Biology (a branch of the Structural Genomics of Pathogenic Protozoa Consortium taking that institution's place), the Center for Structures of Membrane Proteins, and the New York Consortium on Membrane Protein Structure. Two homology modeling centers, the Joint Center for Molecular Modeling and New Methods for High-Resolution Comparative Modeling were also added, as well as two resource centers, the PSI Materials Repository and the PSI Structural Genomics Knowledgebase (SGKB). The TB Structural Genomics Consortium was removed from the roster of supported research centers in the transition from PSI-1 to PSI-2.

Originally launched in February 2008, the SGKB is a free resource that provides information on protein sequence and keyword searching, as well as modules describing target selection, experimental protocols, structure models, functional annotation, metrics on overall progress, and updates on structure determination technology. Like the PDB, it is directed by Dr. Helen M. Berman and hosted at Rutgers University.

The PSI Materials Repository, established in 2006 at the Harvard Institute of Proteomics, stores and ships PSI-generated plasmid clones. Clones are annotated and stored in the Plasmid Information Database (PlasmID) upon submission. As of November 16, 2008 there are almost 10,000 PSI-generated plasmid clones available through PlasmID in addition to over 100,000 clones generated from non-PSI sources. The PSI Materials Repository has a five year budget of $5.4 million, and is under the direction of Dr. Joshua LaBaer.

Impact
As of January 2006, about two thirds of worldwide structural genomics (SG) output was made by PSI centers. Of these PSI contributions over 20% represented new Pfam families, compared to the non-SG average of 5%. Pfam families represent structurally distinct groups of proteins as predicted from sequenced genomes. Not targeting homologs of known structure was accomplished by using sequence comparison tools like BLAST and PSI-BLAST. Like the difference in novelty as determined by discovery of new Pfam families, the PSI also discovered more SCOP folds and superfamilies than non-SG efforts. In 2006, 16% of structures solved by the PSI represented new SCOP folds and superfamilies, while the non-SG average was 4%. Solving such novel structures reflects increased coverage of protein fold space, one of the PSI's main goals. Determining the structure a novel protein allows homology modeling to more accurately predict the fold of other proteins in the same structural family.

While most of the structures solved by the four large-scale PSI centers lack functional annotation, many of the remaining PSI centers determine structures for proteins with known biological function. The TB Structural Genomics Consortium, for example, focuses exclusively on functionally characterized proteins. During its term in PSI-1, it deposited structures for over 70 unique proteins from Mycobacterium tuberculosis, which represented more than than 35% of total unique M. tuberculosis structures solved through 2007. In following with its biomedical theme to increase coverage of phosphotomes, the NYSGXRC has determined structures for about 10% of all human phosphatases.

The PSI consortia have provided the overwhelming majority of targets for the Critical Assessment of Techniques for Protein Structure Prediction (CASP), a community-wide, biannual experiment to determine the state and progress of protein structure prediction.

Criticism
The PSI has received notable criticism from the structural biology community. Among these charges is that the main product of the PSI -- PDB files of proteins' atomic coordinates as determined by X-ray crystallography or NMR spectroscopy -- are not useful enough to biologists to justify the project's $764 million cost. The structures solved by the PSI often do not yield significant functional information on the protein. For example, while the anti-leukemia drug Gleevec was being developed, only the open structure of its target enzyme, Bcr-Abl tyrosine kinase, was available. Since only the kinase's closed form binds Gleevec, the open structure was of no use to Novartis while designing the drug. Critics also note that money currently spent on the PSI could have otherwise funded what they consider worthier causes:

"The $60 million a year in public money that is being spent - I would say, wasted - on the PSI is enough to fund approximately 100-200 individual investigator-initiated research grants. These hypothesis-driven proposals are the lifeblood of the scientific enterprise, and as I have discussed recently in other columns, they are being sucked dry by, among other things, an increasing trend to fund large initiatives at their expense. That $60 million a year would raise the payline at a typical NIH institute by about 6 percentile points, enough to make a huge difference to peer review and to the continuance of a lot of important science."

- Gregory Petsko, PhD