Logistic regression

You don't need to be Editor-In-Chief to add or edit content to WikiDoc. You can begin to add to or edit text on this WikiDoc page by clicking on the edit button at the top of this page. Next enter or edit the information that you would like to appear here. Once you are done editing, scroll down and click the Save page button at the bottom of the page.

Jump to: navigation, search

In statistics, logistic regression is a regression model for binomially distributed response/dependent variables. It is useful for modeling the probability of an event occurring as a function of other factors. It is a generalized linear model that uses the logit as its link function.

Logistic regression is used extensively in the medical and social sciences. Other names for logistic regression used in various other application areas include logistic model, logit model, and maximum-entropy classifier.

Contents

Overview

Logistic regression analyzes binomially distributed data of the form

Y_i \ \sim  B(p_i,n_i),\text{ for }i = 1, \dots , m,

where the numbers of Bernoulli trials ni are known and the probabilities of success pi are unknown. An example of this distribution is the fraction of seeds (pi) that germinate after ni are planted.

The model is then that for each trial (value of i) there is a set of explanatory/independent variables that might inform the final probability. These explanatory variables can be thought of as being in a k vector Xi and the model then takes the form

p_i = \operatorname{E}\left(\left.\frac{Y_i}{n_{i}}\right|X_i \right). \,\!

The logits of the unknown binomial probabilities (i.e., the logarithms of the odds) are modelled as a linear function of the Xi.

\operatorname{logit}(p_i)=\ln\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_{1,i} + \cdots + \beta_k x_{k,i}.

Note that a particular element of Xi can be set to 1 for all i to yield an intercept in the model. The unknown parameters βj are usually estimated by maximum likelihood.

The interpretation of the βj parameter estimates is as the additive effect on the log odds ratio for a unit change in the jth explanatory variable. In the case of a dichotomous explanatory variable, for instance gender, eβ is the estimate of the odds ratio of having the outcome for, say, males compared with females.

The model has an equivalent formulation as

p_i = \frac{1}{1+e^{-(\beta_0 + \beta_1 x_{1,i} + \cdots + \beta_k x_{k,i})}}. \,\!

This functional form is commonly identified as a single-layer "perceptron" or single-layer artificial neural network. A single-layer neural network computes a continuous output instead of a step function. The derivative of pi with respect to X = x1...xk is computed from the general form:

y = \frac{1}{1+e^{-f(X)}}

where f(X) is an analytic function in X. With this choice, the single-layer network is identical to the logistic regression model. This function has a continuous derivative, which allows it to be used in backpropagation. This function is also preferred because its derivative is easily calculated:

y' = y(1-y)\frac{\mathrm{d}f}{\mathrm{d}X}\,\!

Extensions

Extensions of the model exist to cope with multi-category dependent variables and ordinal dependent variables, such as polytomous regression. Multi-class classification by logistic regression is known as multinomial logit modeling. An extension of the logistic model to sets of interdependent variables is the conditional random field.

Example

Let p(x) be the probability of success when the value of the predictor variable is x. Then let

p(x) = \frac{1}{1+e^{-(B_0+B_1x)}} = \frac{e^{B_0 + B_1x}}{1+e^{B_0+B_1x}}.

Algebraic manipulation shows that

\frac{p(x)}{1-p(x)} = e^{B_0+B_1x},

where \frac{p(x)}{1-p(x)} is the odds in favor of success.

If we take a simple example, say p(50) = 2/3, then

\frac{p(50)}{1-p(50)} = \frac{\frac{2}{3}}{1-\frac{2}{3}} = 2.

So when x = 50, a success is twice as likely as a failure. Or, it can be simply said that the odds are 2 to 1.

See also

External links

References

  • Agresti, Alan. (2002). Categorical Data Analysis. New York: Wiley-Interscience. ISBN 0-471-36093-7. 
  • Amemiya, T. (1985). Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0. 
  • Balakrishnan, N. (1991). Handbook of the Logistic Distribution. Marcel Dekker, Inc.. ISBN 978-0824785871. 
  • Green, William H. (2003). Econometric Analysis, fifth edition. Prentice Hall. ISBN 0-13-066189-9. 
  • Hosmer, David W.; Stanley Lemeshow (2000). Applied Logistic Regression, 2nd ed.. New York; Chichester, Wiley. ISBN 0-471-35632-8. 



cs:Logistická regrese de:Logistische Regression fa:وایازی لوژستیکی fr:Régression logistique it:Regressione logistica ja:ロジスティック回帰 sv:Enkel logistisk regression


Acknowledgement and Attribution Regarding Sources of Content

Some of the initial content on this page may be incorporated in part from copyleft sources in the public domain including wikis such as Wikipedia and AskDrWiki. Drug information for patients came from the The National Library of Medicine. Infectious disease information may have come from the Centers for Disease Control (CDC). Differential Diagnoses are drawn from clinicians as well as an amalgamation of 3 sources: 1.The Disease Database; 2. Kahan, Scott, Smith, Ellen G. In A Page: Signs and Symptoms. Malden, Massachusetts: Blackwell Publishing, 2004:3; 3. Sailer, Christian, Wasner, Susanne. Differential Diagnosis Pocket. Hermosa Beach, CA: Borm Bruckmeir Publishing LLC, 2002:7 .

Personal tools
In other languages