Nonlinear dimensionality reduction

High dimensional data, meaning data which requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded non-linear manifold within the higher dimensional space. If the manifold is of low enough dimension then the data can be visualised in the low dimensional space.

Below are summarized some important algorithms from the history of manifold learning and nonlinear dimensionality reduction. Many of these non-linear dimensionality reduction methods are related to linear methods which are listed below. The non-linear methods can be broadly classified into two groups: those which actually provide a mapping (either from the high dimensional space to the low dimensional embedding or vice versa), and those that just give a visualisation. Typically those that just give a visualisation are based on proximity data - that is, distance measurements.

Linear methods

 * Independent component analysis (ICA).
 * Principal component analysis (PCA) (also called Karhunen-Loève transform &mdash; KLT).
 * Singular value decomposition (SVD).
 * Factor analysis.

Non-linear mappings
Perhaps the principal method amongst those that provide a mapping from the high dimensional space to the embedded space is kernel PCA. This method provides a non-linear principal components analysis (PCA) by applying the kernel trick. Kernel PCA first (implicitly) construct a higher dimensional space, in which there are a large number of linear relations between the dimensions. Subsequently, the low-dimensional data representation is obtained by applying traditional PCA.

Gaussian process latent variable models (GPLVM) are a probabilistic non-linear PCA. Like kernel PCA they use a kernel function to form the mapping (in the form of a Gaussian process). However in the GPLVM the mapping is from the embedded space to the data space (like density networks and GTM) whereas in kernel PCA it is in the opposite direction.

Other nonlinear techniques include techniques for locally linear embedding (such as Locally Linear Embedding (LLE), Hessian LLE, Laplacian Eigenmaps, and LTSA). These techniques construct a low-dimensional data representation using a cost function that retains local properties of the data (actually these techniques can be viewed upon as defining a graph-based kernel for Kernel PCA). In this way, the techniques are capable of unfolding datasets such as the Swiss roll. Techniques that employ neighborhood graphs in order to retain global properties of the data include Isomap and Maximum Variance Unfolding.

A completely different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special kind of feed-forward neural networks. Although the idea of autoencoders is quite old, training of the encoders has only recently become possible through the use of Restricted Boltzmann machines. Related to autoencoders is the NeuroScale algorithm, which uses stress functions inspired by multidimensional scaling and Sammon mappings (see below) to learn a non-linear mapping from the high dimensional to the embedded space. The mappings in NeuroScale are based on radial basis function networks.

Kohonen maps and its probabilistic variant generative topographic mapping (GTM) use a point representation in the embedded space to form a latent variable model which is based around a non-linear mapping from the embedded space to the high dimensional space. These techniques are related to work on density networks, which also are based around the same probabilistic model.

Methods based on proximity matrices
A method based on proximity matrices is one where the data is presented to the algorithm in the form of a similarity matrix or a distance matrix. These methods all fall under the broader class of metric multidimensional scaling. The variations tend to be differences in how the proximity data is computed; for example, Isomap, locally linear embeddings, maximum variance unfolding, and Sammon mapping (which is not in fact a mapping) are examples of metric multidimensional scaling methods.