Non-negative matrix factorization


 * NMF redirects here. For the bridge convention, see new minor forcing.

Non-negative matrix factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix, $$\mathbf{X}$$, is factorized into (usually) two matrices, $$\mathbf{W}$$ and $$\mathbf{H}$$ : $$\operatorname{nmf}(\mathbf{X}) \rightarrow \mathbf{WH} $$

Factorization of matrices is generally non-unique, and a number of different methods of doing so have been developed (e.g. principal component analysis and singular value decomposition) by incorporating different constraints; non-negative matrix factorization differs from these methods in that it enforces the constraint that the factors W and H must be non-negative, i.e., all elements must be equal to or greater than zero.

Usually the number of columns of W and the number of rows of H in NMF are selected so the product WH will become an approximation to X (it has been suggested that the NMF model should be called nonnegative matrix approximation instead). The full decomposition of X then amounts to the two non-negative matrices W and H as well as a residual U, such that: X = WH + U. The elements of the residual matrix can either be negative or positive.

History
Early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name positive matrix factorization. It became more widely known as non-negative matrix factorization after Lee and Seung investigated the properties of the algorithm and published some simple and useful algorithms for two types of factorizations.

Types
There are different types of non-negative matrix factorizations. The different types arise from using different cost functions for measuring the divergence between X and WH and possibly by regularization of the W and/or H matrices.

Two simple divergence functions studied by Lee and Seung are the squared error (or Frobenius norm) and an extension of the Kullback-Leibler divergence to positive matrices (the original Kullback-Leibler divergence is defined on probability distributions). Each divergence leads to a different NMF algorithm, usually minimizing the divergence using iterative update rules.

There are several ways in which the W and H may be found: Lee and Seung's updates are usually refered to as the multiplicative update method, while others have suggested so-called alternating non-negative least squares and "projected gradient".

Relation to other Techniques
The initial paper by Lee & Seung proposed NMF mainly for parts-based decomposition of images. It compares NMF to vector quantization and principal component analysis, and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results.

It was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA". When NMF is obtained by minimizing the Kullback–Leibler divergence, it is in fact equivalent to another instance of multinomial PCA, probabilistic latent semantic analysis, trained by maximum likelihood estimation. That method is commonly used for analyzing and clustering textual data and is also related to the latent class model.

It was also shown that when the Frobenius norm is used as a divergence, NMF is equivalent to a relaxed form of K-means clustering: matrix factor W contains cluster centroids and H contains cluster membership indicators. This also justifies the use of NMF for data clustering.

NMF extends beyond matrices to tensors of arbitrary order.

Uniqueness
The factorization is not unique: A matrix and its inverse can be used to transform the two factorization matrices by, e.g.,
 * $$\mathbf{WH} = \mathbf{WBB}^{-1}\mathbf{H}$$

If the two new matrices $$\mathbf{\tilde{W} = WB}$$ and $$\mathbf{\tilde{H}}=\mathbf{B}^{-1}\mathbf{H}$$ are non-negative they form another parametrization of the factorization.

The non-negativity of $$\mathbf{\tilde{W}}$$ and $$\mathbf{\tilde{H}}$$ applies at least if B is a non-negative monomial matrix. In this simple case it will just correspond to a scaling and a permutation.

More control over the non-uniqueness of NMF is obtained with sparsity constraints.

Others

 * Amy N. Langville, Michael W. Berry, Murray Browne, V. Paul Pauca, and Robert J. Plemmons. A Survey of Algorithms and Applications for the Nonnegative Matrix Factorization. Computational Statistics and Data Analysis. Elsevier. Submitted Jan. 2006.
 * Amy N. Langville, Michael W. Berry, Murray Browne, V. Paul Pauca, and Robert J. Plemmons. A Survey of Algorithms and Applications for the Nonnegative Matrix Factorization. Computational Statistics and Data Analysis. Elsevier. Submitted Jan. 2006.
 * Amy N. Langville, Michael W. Berry, Murray Browne, V. Paul Pauca, and Robert J. Plemmons. A Survey of Algorithms and Applications for the Nonnegative Matrix Factorization. Computational Statistics and Data Analysis. Elsevier. Submitted Jan. 2006.
 * Amy N. Langville, Michael W. Berry, Murray Browne, V. Paul Pauca, and Robert J. Plemmons. A Survey of Algorithms and Applications for the Nonnegative Matrix Factorization. Computational Statistics and Data Analysis. Elsevier. Submitted Jan. 2006.