Dimensionality reduction

You don't need to be Editor-In-Chief to add or edit content to WikiDoc. You can begin to add to or edit text on this WikiDoc page by clicking on the edit button at the top of this page. Next enter or edit the information that you would like to appear here. Once you are done editing, scroll down and click the Save page button at the bottom of the page.

Jump to: navigation, search

Please Take Over This Page and Apply to be Editor-In-Chief for this topic: There can be one or more than one Editor-In-Chief. You may also apply to be an Associate Editor-In-Chief of one of the subtopics below. Please mail us [1] to indicate your interest in serving either as an Editor-In-Chief of the entire topic or as an Associate Editor-In-Chief for a subtopic. Please be sure to attach your CV and or biographical sketch.

In statistics, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. In physics, dimension reduction is a widely discussed phenomenon, whereby a physical system exists in three dimensions, but its properties behave like those of a lower-dimensional system. It has been experimentally realised at the quantum critical point in an insulating magnet made of the pigment Han Purple[1][1].

Feature selection

Main article: Feature selection

Feature selection approaches try to find a subset of the original variables (also called features or attributes). Two strategies are filter (e.g. information gain) and wrapper (e.g. genetic algorithm) approaches. See also combinatorial optimization problems.

It is sometimes the case that data analysis such as regression or classification can be done in the reduced space more accurately than in the original space.

Feature extraction

Main article: Feature extraction

Feature extraction is applying a mapping of the multidimensional space into a space of fewer dimensions. This means that the original feature space is transformed by applying e.g. a linear transformation via a principal components analysis.

Consider a string of beads, first 100 black and then 100 white. If the string is wadded up, a classification boundary between black and white beads will be very complicated in three dimensions. However, there is a mapping from three dimensions to one dimension, namely distance along the string, which makes the classification trivial. Unfortunately, a simplification as dramatic as that is rarely possible in practice.

The main linear technique for dimensionality reduction, principal components analysis (PCA), performs a linear mapping of the data to a lower dimensional space in such a way, that the variance of the data in the low-dimensional representation is maximized. In practice, the correlation matrix of the data is constructed and the eigenvectors on this matrix are computed. The eigenvectors that correspond to the largest eigenvalues (the principal components) can now be used to reconstruct a large fraction of the variance of the original data. Moreover, the first few eigenvectors can often be interpreted in terms of the large-scale physical behaviour of the system. The original space (with dimension of the number of points) has been reduced (with data loss, but hopefully retaining the most important variance) to the space spanned by a few eigenvectors.

Principal component analysis can be employed in a nonlinear way by means of the kernel trick. The resulting techniques is capable of constructing nonlinear mappings that maximize the variance in the data. The resulting technique is entitled Kernel PCA. Other nonlinear techniques include techniques for locally linear embedding (such as locally linear embedding (LLE), Hessian LLE, Laplacian eigenmaps, and LTSA). These techniques construct a low-dimensional data representation using a cost function that retains local properties of the data (actually these techniques can be viewed upon as defining a graph-based kernel for Kernel PCA). In this way, the techniques are capable of unfolding datasets such as the Swiss roll. Techniques that employ neighborhood graphs in order to retain global properties of the data include Isomap and maximum variance unfolding.The neighbourhood preservation can also be achieved through the minimisation of the weighted difference between distances in the input and output spaces (i.e. curvilinear component analysis (CCA) and data-driven high-dimensional scaling (DD-HDS)).

Principal curves and manifolds[1] give another framework for PCA generalization and extend the geometric interpretation of PCA by explicitly constructing a low-dimensional embedded manifold for data approximation.

A completely different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special kind of feed-forward neural networks. Although the idea of autoencoders is quite old, training of the encoders has only recently become possible through the use of Restricted Boltzmann machines.

References

See also

External links

fr:Réduction dimensionnelle

WikiDoc Help Menu

Quick Start..

Editing basics

Advanced editing

Communicating your edits

Help Videos You Can Watch


Acknowledgement and Attribution Regarding Sources of Content

Some of the initial content on this page may be incorporated in part from copyleft sources in the public domain including wikis such as Wikipedia and AskDrWiki. Drug information for patients came from the The National Library of Medicine. Infectious disease information may have come from the Centers for Disease Control (CDC). Differential Diagnoses are drawn from clinicians as well as an amalgamation of 3 sources: 1.The Disease Database; 2. Kahan, Scott, Smith, Ellen G. In A Page: Signs and Symptoms. Malden, Massachusetts: Blackwell Publishing, 2004:3; 3. Sailer, Christian, Wasner, Susanne. Differential Diagnosis Pocket. Hermosa Beach, CA: Borm Bruckmeir Publishing LLC, 2002:7 .

Personal tools
related articles
viewed previously [ + ]