Molecular docking

Molecular docking represents one of the growing applications in computational biology wherein molecular modeling techniques are used to predict how any macromolecules (typically a protein) interact with other molecules (may be other proteins, nucleic acids or small drug-like molecules). The ability of a protein to interact with small molecules governs a significant part of the protein’s dynamics which may enhance/ inhibit its biological function. This plays an important role in the rational design of drugs. The ability to bind large molecules such as other proteins and nucleic acids to form supra-molecular complexes is also known to play an important role in controlling biological pathways. Given the biological significance of molecular docking, considerable efforts have been directed in understanding the process of molecular docking.

Classifying Molecular docking
Typically molecular docking can be classified as:


 * Protein-small molecule Docking
 * Protein-nucleic acid Docking
 * Protein-protein Docking

Although the guiding principles for molecular docking are pretty much the same, the techniques employed in each of three before mentioned types of docking are quite different. Protein-ligand docking is done by modelling the interaction between the protein and ligand: if the geometry of the pair is complementary and involves favorable biochemical interactions, the ligand will potentially bind the protein in vitro or in vivo. Protein-ligand docking represents a simpler end of the complexity spectrum, and there are many available programs that perform particularly well in predicting molecules that may potentially inhibit proteins.

Protein-protein docking is typically much more complex. This is because proteins are flexible, their conformational space is quite vast, and sampling all of these possible conformations for a protein is quite difficult. More often than not, a protein exhibits considerable conformational change in order to bind to its partner protein, and current programs do not perform very well in predicting these vast conformational changes. However, there are programs that have been quite successful in predicting protein-protein complexes.

Protein-nucleic acid (either DNA or RNA) docking is intermediate in complexity in terms of comparing the size of the protein and ligand, however not many available programs are able to perform satisfactorily. This is partially due to the fact that nucleic acids are highly charged molecules, and the exact nature of interactions between a protein and a nucleic acid is particularly not well understood.

Problem Definition
Molecular docking can be thought of as a problem of “lock-and-key”, where one is interested in finding the correct “key” that opens up a “lock” given a set of keys. Here, the protein can be thought of as the “lock” and the ligand can be thought of as a “key”. The difference between the “lock-and-key” problem and that of molecular docking is that there can be potentially multiple “keys” (ligands of various sizes including may be other proteins themselves) that can bind to the same protein. Thus it is possible to define molecular docking as an optimization problem, which would describe the “best-fit” ligand that binds to a particular protein of interest.

Proteins represent a class of molecules which inherently exhibit complex dynamics; thus, exhibiting considerable changes in their conformational state space in response to a particular ligand. This implies that the “best-fit” ligand might have to bind to an ensemble of the protein’s conformation space while minimizing the energy and maximizing the entropy of the overall system. An illustration of this is shown below. During the course of the process, the ligand and the protein adjust their conformation to achieve an overall “best-fit” and this kind of conformational adjustments resulting in the overall binding is referred to as “induced-fit”.



The focus of molecular docking though is not that of molecular recognition; it presumes that the molecules that would bind to each other have “recognized” each other. The focus of molecular docking is to achieve an optimized conformation for both the protein and ligand such that the overall entropy remains high, while minimizing the energy of the system. Thus, the optimization problem for molecular docking is twofold: finding an optimal geometry (shape complementarity) in which the protein and ligand fit together and finding an optimal energy which minimizes the energy, while maximizing the entropy of the overall system.

Approaches to Molecular docking
Two approaches are particularly popular within the molecular docking community. One approach uses a matching technique that describes the protein and the ligand as complementary surfaces. The second approach simulates the actual docking process in which the ligand-protein pairwise interaction energies are calculated. Both approaches have significant advantages as well as some limitations. These are outlined below.

Shape Complementarity Methods
Geometric matching/ shape compelementarity methods describe the protein and ligand as a set of features that make them dockable. These features may include molecular surface/ complementary surface descriptors. In this case, the receptor’s molecular surface is described in terms of its solvent-accessible surface area and the ligand’s molecular surface is described in terms of its matching surface description. The complementarity between the two surfaces amounts to the shape matching description that may help finding the complementary pose of docking the target and the ligand molecules. Another approach is to describe the hydrophobic features of the protein using turns in the main-chain atoms. Yet another approach is to use a Fourier shape descriptor technique described in [ref]. Whereas the shape complementarity based approaches are typically fast and robust, they cannot usually model the movements or dynamic changes in the ligand/ protein conformations accurately, although recent developments allow these methods to investigate ligand flexibility. Shape complementarity methods can quickly scan through several thousand ligands in a matter of seconds and actually figure out whether they can bind at the protein’s active site, and are usually scalable to even protein-protein interactions. They are also much more amenable to pharmacophore based approaches, since they use geometric descriptions of the ligands to find optimal binding.

Simulation Processes
The simulation of the docking process as such is a much more complicated process. In this approach, the protein and the ligand are separated by some physical distance, and the ligand finds its position into the protein’s active site after a certain number of “moves” in its conformational space. The moves incorporate rigid body transformations such as translations and rotations, as well as internal changes to the ligand’s structure including torsion angle rotations. Each of these moves in the conformation space of the ligand induces a total energetic cost of the system, and hence after every move the total energy of the system is calculated. The obvious advantage of the method is that it is more amenable to incorporating ligand flexibility into its modeling whereas shape complementarity techniques have to use some ingenious methods to incorporate flexibility in ligands. Another advantage is that the process is physically closer to what happens in reality, when the protein and ligand approach each other after molecular recognition. A clear disadvantage of this technique is that it takes longer time to evaluate the optimal pose of binding since they have to explore a rather large energy landscape. However grid-based techniques as well as fast optimization methods have significantly ameliorated these problems.

The Mechanics of Docking
To perform a docking screen, the first requirement is a structure of the protein of interest. Usually the structure has been determined using a biophysical technique such as x-ray crystallography, or less often, NMR spectroscopy. This protein structure and a database of potential ligands serve as inputs to a docking program. The success of a docking program depends on two components: the search algorithm and the scoring function.

The search algorithm
The search space consists of all possible orientations and conformations of the protein paired with the ligand. With present computing resources, it is impossible to exhaustively explore the search space—this would involve enumerating all possible distortions of each molecule (molecules are dynamic and exist in an ensemble of conformational states) and all possible rotational and translational orientations of the ligand relative to the protein at a given level of granularity. Most docking programs in use account for a flexible ligand, and several are attempting to model a flexible protein receptor. Each "snapshot" of the pair is referred to as a pose. There are many strategies for sampling the search space. Here are some examples:
 * Use a coarse-grained molecular dynamics simulation to propose energetically reasonable poses
 * Use a "linear combination" of multiple structures determined for the same protein to emulate receptor flexibility
 * Use a genetic algorithm to "evolve" new poses that are successively more and more likely to represent favorable binding interactions.

The scoring function
The scoring function takes a pose as input and returns a number indicating the likelihood that the pose represents a favorable binding interaction.

Most scoring functions are physics-based molecular mechanics force fields that estimate the energy of the pose; a low (negative) energy indicates a stable system and thus a likely binding interaction. An alternative approach is to derive a statistical potential for interactions from a large database of protein-ligand complexes, such as the Protein Data Bank, and evaluate the fit of the pose according to this inferred potential.

There are a lot of structures from X-ray diffraction for complexes between proteins and high affinity ligands, but very few for low affinity ligands as these do not stay bound for long enough to be seen. Scoring functions trained with this data can dock high affinity ligands correctly, but they will also give plausible docked conformations for ligands that really are inactive. This gives a large number of false positive hits, i.e., ligands predicted to bind to the protein that actually don't when placed together in a test tube.

One way to reduce the number of false positives is to recalculate the energy of the top-hit poses using a higher resolution (and therefore slow) technique like Generalized Born or Poisson-Boltzmann methods. However, typically the researcher will screen a database of tens to hundreds of thousands of compounds and test the top 60 or so in vitro, and to identify any true binders is still considered a success.

Applications
A binding interaction may mean that the ligand inhibits the protein's function or acts as an agonist. Docking is most pertinent to the field of drug design—most drugs are small molecules, and using a computational approach allows researchers to quickly screen large databases of potential drugs (e.g., the ZINC database of compounds for virtual screening) against protein targets such as HIV reverse transcriptase. Traditional discovery of drug candidates occurs by chance or through painstaking work in the lab. For example, virtual screening and related combinatorial chemistry techniques are particularly important in searching for new antibiotics as strains of resistant bacteria increasingly appear due to overuse of antibiotics.