Searching the conformational space for docking

The search space consists of all possible orientations and conformations of the protein paired with the ligand. With present computing resources, it is impossible to exhaustively explore the search space; this would involve enumerating all possible distortions of each molecule (molecules are dynamic and exist in an ensemble of conformational states) and all possible rotational and translational orientations of the ligand relative to the protein at a given level of granularity. Most docking programs in use account for a flexible ligand, and several are attempting to model a flexible protein receptor. Each "snapshot" of the pair is referred to as a pose. There are many strategies for sampling the search space; however, the interesting ones include the ones that tend to use molecular dynamics simulations, shape-complementarity methods and genetic algorithms.

Molecular dynamics simulations
In this approach, proteins are typically held rigid, and the ligand is allowed to freely explore their conformational space. These generated conformations are then docked successively into the protein, and an MD simulation consisting of a simulated annealing protocol is performed. This is usually supplemented with short MD energy minimization steps and the energies determined from the MD runs are used for ranking the overall scoring. Although this is a compute-expensive method (involving potentially hundreds of MD runs) there are some advantages in using this approach. For one, no specialized energy/ scoring functions need to be developed. MD force-fields can be typically used to find good poses that are reasonable and can be compared with experimental structures. A recent work [ref] also focuses on docking targets using an adaptation of the Distance constrained Essential Dynamics (DCED) which was used to generate multiple structures for docking (called eigenstructures). This approach although avoiding most of the costly molecular dynamics calculations, can capture the essential motions involved in a flexible receptor representing a form of coarse-grained dynamics.

Shape-complementarity methods
Perhaps the most common technique used in many docking programs, shape complementarity methods focus more on the match between the receptor and the ligand to effectively find an optimal pose. Several programs are good examples for this approach: (a) DOCK, (b) FRED, (c) GLIDE, (d) SLIDE, (e) SURFLEX and many more. Most shape complementarity methods describe the molecules of interest in terms of a finite number of descriptors that include structural complementarity and binding complementarity. Structural <complementarity is mostly a geometric description of the molecules of interest, including solvent accessible surface area, overall shape and geometric constraints between atoms in the protein/ ligand. Binding complementarity takes into account features like hydrogen-bonding interactions, hydrophobic contacts and van der Waals interactions to describe how well a particular ligand will bind to the protein. Both kinds of descriptors are conveniently represented in the form of structural templates which are then used to quickly match potential compounds (either from a database or from the user-given inputs) that will bind well at the active site of the protein.

Compared to the all-atom molecular dynamics approaches, these methods are relatively very efficient in finding optimal binding poses for the protein and ligand. Specifically DOCK (in its current version) has several improvements, that allows one to perform virtual screening, while identifying potential matches that may later be used for target refinement. SLIDE handles side-chain flexibility as well as main-chain flexibility, with the help of a coarse-grained model (ProFlex/ FIRST) and supports the placement of ligands into a conformational ensemble of structures. Both these programs provide a number of enhancements for anchoring an area of interest in the ligand within the active site of the protein, while allowing the rest of the ligand to freely move so that an optimal binding pose can be achieved.

Genetic algorithms
Two of the most used docking programs belong to this class (a) GOLD and (b) AutoDock. The current version of AutoDock includes a simulated annealing protocol but the recommended algorithm for docking is a Lamarckian genetic algorithm. Genetic algorithms allow the exploration of a large conformational space (which is basically spanned by the protein and ligand jointly in this case), by representing each spatial arrangement of the protein and ligand as a “gene” with a particular energy. The entire genome thus represents the complete energy landscape which is to be explored. The simulation of the evolution of the genome is carried out by cross-over techniques similar to biological evolution, where random pairs of individuals (conformations) are “mated” with the possibility for a random mutation in the offspring. These methods have proven to be very useful in sampling the vast state-space while maintaining closeness to the actual process involved.

Although genetic algorithms are quite successful in sampling the large conformational space, many docking programs allow the protein to remain fixed, while allowing only the ligand to flex and adjust to the active site of the protein. Genetic algorithms also require multiple runs in order to obtain reliable answers regarding ligands that may bind to the protein. The time it takes to typically run a genetic algorithm in order to allow a proper pose may be longer, and hence these methods may not be as efficient as the shape complementarity based approaches in screening large databases of compounds. Recent improvements in using grid-based evaluation of energies, limiting the exploration of the conformational changes at only local areas (active sites) of interest, and improved tabling methods have significantly enhanced the performance of genetic algorithms and made them suitable for virtual screening applications.