Threading (protein sequence)

Threading is a method for the computational prediction of protein structure from protein sequence.

Protein threading or fold recognition refers to a class of computational methods for predicting the structure of a protein from amino acid sequence. The basic idea is that the target sequence (the protein sequence for which the structure is being predicted) is threaded through the backbone structures of a collection of template proteins (known as the fold library) and a “goodness of fit” score calculated for each sequence-structure alignment. This goodness of fit is often derived in terms of an empirical energy function, based on statistics derived from known protein structures, but many other scoring functions have been proposed and tried over the years. The most useful scoring functions include both pairwise terms (interactions between pairs of amino acids) and solvation terms. Threading methods share some of the characteristics of both comparative modelling methods (the sequence alignment aspect) and ab initio prediction methods (predicting structure based on identifying low-energy conformations of the target protein).

Fold recognition methods can be broadly divided into two types: 1. methods that derive a 1-D profile for each structure in the fold library and align the target sequence to these profiles; 2. methods that consider the full 3-D structure of the protein template. A simple example of a profile representation would be to take each amino acid in the structure and simply label it according to whether it is buried in the core of the protein or exposed on the surface. More elaborate profiles might take into account the local secondary structure (e.g. whether the amino acid is part of an alpha helix) or even evolutionary information (how conserved the amino acid is). In the 3-D representation, the structure is modelled as a set of inter-atomic distances i.e. the distances are calculated between some or all of the atom pairs in the structure. This is a much richer and far more flexible description of the structure, but is much harder to use in calculating an alignment. The profile-based fold recognition approach was first described by Bowie, Lüthy and Eisenberg in 1991. The term threading was first coined by Jones, Taylor and Thornton in 1992, and originally referred specifically to the use of a full 3-D structure atomic representation of the protein template in fold recognition. Today, the terms threading and fold recognition are frequently (though somewhat incorrectly) used interchangeably.

Fold recognition methods are widely used and effective because it is believed that there are a strictly limited number of different protein folds in nature, mostly as a result of evolution but also due to constraints imposed by the basic physics and chemistry of polypeptide chains. There is, therefore, a good chance (currently 70-80%) that a protein which has a similar fold to the target protein has already been studied by X-ray crystallography or NMR spectroscopy and can be found in the PDB (Protein Data Bank). Currently there are just over 1100 different protein folds known (see CATH database statistics for latest view), but new folds are still being discovered every year thanks in part to the ongoing structural genomics projects.

Many different algorithms have been proposed for finding the correct threading of a sequence onto a structure, though many make use of dynamic programming in some form. For full 3-D threading, the problem of identifying the best alignment is very difficult (it is an NP-hard problem) and researchers have made use of many combinatorial optimization methods such as simulated annealing or branch and bound searching to arrive at heuristic solutions.

It is interesting to compare threading methods to methods which attempt to align two protein structures (Protein structural alignment), and indeed many of the same algorithms have been applied to both problems.