Logistic regression

Overview
In statistics, logistic regression is a regression model for binomially distributed response/dependent variables. It is useful for modeling the probability of an event occurring as a function of other factors. It is a generalized linear model that uses the logit as its link function.

Logistic regression is used extensively in the medical and social sciences. Other names for logistic regression used in various other application areas include logistic model, logit model, and maximum-entropy classifier.

Logistic regression analyzes binomially distributed data of the form


 * $$Y_i \ \sim B(p_i,n_i),\text{ for }i = 1, \dots, m,$$

where the numbers of Bernoulli trials ni are known and the probabilities of success pi are unknown. An example of this distribution is the fraction of seeds (pi) that germinate after ni are planted.

The model is then that for each trial (value of i) there is a set of explanatory/independent variables that might inform the final probability. These explanatory variables can be thought of as being in a k vector Xi and the model then takes the form


 * $$p_i = \operatorname{E}\left(\left.\frac{Y_i}{n_{i}}\right|X_i \right). \,\!$$

The logits of the unknown binomial probabilities (i.e., the logarithms of the odds) are modelled as a linear function of the Xi.


 * $$\operatorname{logit}(p_i)=\ln\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_{1,i} + \cdots + \beta_k x_{k,i}.$$

Note that a particular element of Xi can be set to 1 for all i to yield an intercept in the model. The unknown parameters &beta;j are usually estimated by maximum likelihood.

The interpretation of the &beta;j parameter estimates is as the additive effect on the log odds ratio for a unit change in the jth explanatory variable. In the case of a dichotomous explanatory variable, for instance gender, $$e^\beta$$ is the estimate of the odds ratio of having the outcome for, say, males compared with females.

The model has an equivalent formulation as


 * $$p_i = \frac{1}{1+e^{-(\beta_0 + \beta_1 x_{1,i} + \cdots + \beta_k x_{k,i})}}. \,\!$$

This functional form is commonly identified as a single-layer "perceptron" or single-layer artificial neural network. A single-layer neural network computes a continuous output instead of a step function. The derivative of pi with respect to X = x1...xk is computed from the general form:


 * $$y = \frac{1}{1+e^{-f(X)}}$$

where f(X) is an analytic function in X. With this choice, the single-layer network is identical to the logistic regression model. This function has a continuous derivative, which allows it to be used in backpropagation. This function is also preferred because its derivative is easily calculated:


 * $$y' = y(1-y)\frac{\mathrm{d}f}{\mathrm{d}X}\,\!$$

Extensions
Extensions of the model exist to cope with multi-category dependent variables and ordinal dependent variables, such as polytomous regression. Multi-class classification by logistic regression is known as multinomial logit modeling. An extension of the logistic model to sets of interdependent variables is the conditional random field.

Example
Let p(x) be the probability of success when the value of the predictor variable is x. Then let


 * $$p(x) = \frac{1}{1+e^{-(B_0+B_1x)}} = \frac{e^{B_0 + B_1x}}{1+e^{B_0+B_1x}}.$$

Algebraic manipulation shows that


 * $$\frac{p(x)}{1-p(x)} = e^{B_0+B_1x},$$

where $$\frac{p(x)}{1-p(x)}$$ is the odds in favor of success.

If we take a simple example, say p(50) = 2/3, then


 * $$\frac{p(50)}{1-p(50)} = \frac{\frac{2}{3}}{1-\frac{2}{3}} = 2.$$

So when x = 50, a success is twice as likely as a failure. Or, it can be simply said that the odds are 2 to 1.