Particle swarm optimization

Particle swarm optimization (PSO) is a stochastic, population-based computer problem-solving algorithm; it is a kind of swarm intelligence that is based on social-psychological principles and provides insights into social behavior, as well as contributing to engineering applications.

Overview
The particle swarm optimization algorithm was first reported in 1995 by James Kennedy and Russell C. Eberhart. The techniques have evolved greatly since then, and the original version of the algorithm is barely recognizable in the current ones.

Social influence and social learning enable a person to maintain cognitive consistency. People solve problems by talking with other people about them, and as they interact their beliefs, attitudes, and behaviors change; the changes could typically be depicted as the individuals moving toward one another in a sociocognitive space.

The particle swarm simulates this kind of social optimization. A problem is given, and some way to evaluate a proposed solution to it -- a fitness function; a communication structure or social network is also defined, assigning neighbors for each individual to interact with. Then a population of individuals defined as random guesses at problem solutions is initialized, and the process is set in motion. The individuals iteratively evaluate their candidate solutions and remember the location of their best success so far, making this information available to their neighbors; they are also able to see where their neighbors have had success. Movements through the search space are guided by these successes, with the population usually converging, by the end of a trial, on a good problem solution.

Important advances in recent years have included the application of constriction coefficients and inertia weights; numerous techniques for preventing premature convergence; the introduction of the fully informed particle swarm; many variations on the social network topology; parameter-free, fully adaptive swarms; and some highly simplified models have been created. The algorithm has been analyzed as a dynamical system, and has been used in hundreds of engineering applications; it is used to compose music, to model markets and organizations, and in art installations. The paradigm is new and is still evolving at a fast rate.

The swarm is typically modelled by particles in multidimensional space that have a position and a velocity. These particles fly through hyperspace (i.e., $$\mathbb{R}^n$$) and have two essential reasoning capabilities: their memory of their own best position and knowledge of their neighborhood's best, "best" simply meaning the position with the smallest objective value. Members of a swarm communicate good positions to each other and adjust their own position and velocity based on these good positions. There are two main ways this communication is done:
 * a global best that is known to all and immediately updated when a new best position is found by any particle in the swarm
 * "neighborhood" bests where each particle only immediately communicates with a subset of the swarm about best positions

In the algorithm presented below there is a global best rather than neighborhood bests. Neighborhood bests allow parallel exploration of the search space and reduce the susceptibility of PSO to falling into local minima, but slow down convergence speed. Note that neighborhoods merely slow down the proliferation of new bests, rather than creating isolated subswarms because of the overlapping of neighborhoods: to make neighborhoods of size 3, say, particle 1 would only communicate with particles 2 through 5, particle 2 with 3 through 6, and so on. But then a new best position discovered by particle 2's neighborhood would be communicated to particle 1's neighborhood at the next iteration of the PSO algorithm presented below. Smaller neighborhoods lead to slower convergence, while larger neighborhoods to faster convergence, with a global best representing a neighborhood consisting of the entire swarm.

A single particle by itself is unable to accomplish anything. The power is in interactive collaboration.

A basic, canonical PSO algorithm
Let $$f : \mathbb{R}^m \rightarrow \mathbb{R}$$ be the objective function. Let there be $$n$$ particles, each with associated positions $$\mathbf{x}_i \in \mathbb{R}^m$$ and velocities $$\mathbf{v}_i \in \mathbb{R}^m$$, $$i = 1, \ldots, n$$. Let $$\hat{\mathbf{x}}_i$$ be the current best position of each particle and let $$\hat{\mathbf{g}}$$ be the global best.


 * Initialize $$\mathbf{x}_i$$ and $$\mathbf{v}_i$$ for all $$i$$. One common choice is to take $$\mathbf{x}_{ij} \in U[a_j, b_j]$$ and $$\mathbf{v}_i = \mathbf{0}$$ for all $$i$$ and $$j = 1, \ldots, m$$, where $$a_j, b_j$$ are the limits of the search domain in each dimension.
 * $$\hat{\mathbf{x}}_i \leftarrow \mathbf{x}_i$$ and $$\hat{\mathbf{g}} \leftarrow \arg\min_{\mathbf{x}_i} f(\mathbf{x}_i), i = 1, \ldots, n$$.


 * While not converged:
 * For $$1 \leq i \leq n$$:
 * $$\mathbf{x}_i \leftarrow \mathbf{x}_i+\mathbf{v}_i$$.
 * $$\mathbf{v}_i \leftarrow {\omega}\mathbf{v}_i+c_1\mathbf{r}_1\circ(\hat{\mathbf{x}}_i-\mathbf{x}_i)+c_2\mathbf{r}_2\circ(\hat{\mathbf{g}}-\mathbf{x}_i)$$.
 * If $$f(\mathbf{x}_i) < f(\hat{\mathbf{x}}_i)$$, $$\hat{\mathbf{x}}_i \leftarrow \mathbf{x}_i$$.
 * If $$f(\mathbf{x}_i) < f(\hat{\mathbf{g}})$$, $$\hat{\mathbf{g}} \leftarrow \mathbf{x}_i$$.

Note the following about the above algorithm:
 * $$\omega$$ is an inertial constant. Good values are usually slightly less than 1.
 * $$c_1$$ and $$c_2$$ are constants that say how much the particle is directed towards good positions. They represent a "cognitive" and a "social" component, respectively, in that they affect how much the particle's personal best and the global best (respectively) influence its movement. Usually we take $$c_1, c_2 \approx 2$$.
 * $$\mathbf{r}_1, \mathbf{r}_2$$ are two random vectors with each component generally a uniform random number between 0 and 1.
 * $$\circ$$ operator indicates element-by-element multiplication i.e. the Hadamard matrix multiplication operator.


 * Note that there is a misconception arising from the tendency to write the velocity formula in a "vector notation". The original intent (see M.C.'s "Particle Swarm Optimization, 2006") was to multiply a NEW random component per dimension, rather than multiplying the same component with each dimension per particle. Moreover, r1 and r2 are supposed to consist of a single number, defined as Cmax, which normally has a relationship with omega (defined as C1 in the literature) through a transcendental function, given the value 'phi': C1 = 1.0 / (phi - 1.0 + (v_phi * v_phi) - (2.0 * v_phi))  - and -   Cmax = C1 * phi.   Optimal "confidence coefficients" are therefore approximately in the ratio scale of C1=0.7 and Cmax=1.43.   The Pseudo code shown below however, describes the intent correctly - mishka

Pseudo code
Here follows a pseudo code example of the position update of the swarm. Note that the random vectors $$\mathbf{r}_1, \mathbf{r}_2$$ are implemented as scalars inside the dimension loop which is equivalent to the mathematical description given above.

for I = 1 to number of particles n do   for J=1 to number of dimensions m do      R1=uniform random number R2=uniform random number V[I][J]=w*V[I][J] +C1*R1*(P[I][J]-X[I][J]) +C2*R2*(G[J]-X[I][J]) X[I][J] = X[I][J]+V[I][J] enddo enddo

Discussion
By studying this algorithm, we see that we are essentially carrying out something like a discrete-time simulation where each iteration of it represents a "tic" of time. The particles "communicate" information they find about each other by updating their velocities in terms of local and global bests; when a new best is found, the particles will change their positions accordingly so that the new information is "broadcast" to the swarm. The particles are always drawn back both to their own personal best positions and also to the best position of the entire swarm. They also have stochastic exploration capability via the use of the random multipliers $$r_1, r_2$$. The vector, floating-point nature of the algorithm suggests that high-performance implementations could be created that take advantage of modern hardware extensions pertaining to vectorization, such as Streaming SIMD Extensions and Altivec.

Typical convergence conditions include reaching a certain number of iterations, reaching a certain fitness value, and so on.

Variations and practicalities
There are a number of considerations in using PSO in practice; one might wish to clamp the velocities to a certain maximum amount, for instance. The considerable adaptability of PSO to variations and hybrids is seen as a strength over other robust evolutionary optimization mechanisms, such as genetic algorithms. For example, one common, reasonable modification is to add a probabilistic bit-flipping local search heuristic to the loop. Normally, a stochastic hill-climber risks getting stuck at local maxima, but the stochastic exploration and communication of the swarm overcomes this. Thus, PSO can be seen as a basic search "workbench" that can be adapted as needed for the problem at hand.

Note that the research literature has uncovered many heuristics and variants determined to be better with respect to convergence speed and robustness, such as clever choices of $$\omega$$, $$c_i$$, and $$r_i$$. There are also other variants of the algorithm, such as discretized versions for searching over subsets of $$\mathbb{Z}^n$$ rather than $$\mathbb{R}^n$$. There has also been experimentation with coevolutionary versions of the PSO algorithm with good results reported. Very frequently the value of $$\omega$$ is taken to decrease over time; e.g., one might have the PSO run for a certain number of iterations and DECREASE linearly from a starting value (0.9, say) to a final value (0.4, say) in order to facilitate exploitation over exploration in later states of the search. The literature is full of such heuristics. In other words, the canonical PSO algorithm is not as strong as various improvements which have been developed on several common function optimization benchmarks and consulting the literature for ideas on parameter choices and variants for particular problems is likely to be helpful.

Significant, non-trivial modifications have been developed for multi-objective optimization, versions designed to find solutions satisfying linear or non-linear constraints, as well as "niching" versions designed to find multiple solutions to problems where it is believed or known that there are multiple global minima which ought to be located. For an up-to-date literature survey and in-depth discussion of PSO along with the related paradigm of ant colony optimization and most of the documented modifications and heuristics mentioned above, see Engelbrecht's book below.

Applications
Although a relatively new paradigm, PSO has been applied to a variety of tasks, such as the training of artificial neural networks and for finite element updating. Very recently, PSO has been applied in combination with grammatical evolution to create a hybrid optimization paradigm called "grammatical swarms". Engelbrecht's book has a chapter on applications of PSO.