Approximate nonnegative matrix factorization

Approximate nonnegative matrix factorization describes any algorithm which can be used to factor a matrix composed of non-negative values into two other matrices, which (when multiplied) will approximately equal the original result. These factor matrices are chosen so that they are of smaller dimension than their product, and hence easier to store and manipulate.

Formal problem description
Given a nonnegative matrix A of rank m x n, and a positive integer k smaller than m and n, find nonnegative matrices W and H to minimize the function
 * F(W,H) = 1/2 ||A-WH||2f

The product of W and H is called the nonnegative matrix factorization of A, though it may not exactly equal A (hence it is an approximate factorization). The solution is generally not unique.

Algorithms
Algorithms for finding nonnegative matrix factorizations include multiplicative update algorithms, gradient descent algorithms, and alternating least-squares algorithms. These algorithms may be less than ideal, however, as they typically can only be guaranteed to find local minima, rather than a global minimum, for F(W,H), but in many data mining applications a local minimum may still be enough to be useful.

Text mining
Approximate nonnegative matrix factorization can be used for text mining applications. In this process, a term-document matrix is constructed with the weights of various terms (typically weighted word frequency information) from a set of documents. This matrix is factored into a term-feature and a feature-document matrix. The features are inferred by the contents of the documents, and the feature-document matrix describes data clusters of related documents.

Spectral data analysis
Approximate matrix factorization is also used to analyze spectral data; one such use is in the classification of space objects and debris.

Current Research
Current research in approximate nonnegative matrix factorization includes searching for techniques to initialize the factor matrices for various algorithms, algorithms to find global minima for the factors, and effecient ways to update the factors when new data is added to the matrix.