Logistic regression
You don't need to be Editor-In-Chief to add or edit content to WikiDoc. You can begin to add to or edit text on this WikiDoc page by clicking on the edit button at the top of this page. Next enter or edit the information that you would like to appear here. Once you are done editing, scroll down and click the Save page button at the bottom of the page.
| To comply with Wikipedia's quality standards, this article may need to be rewritten. Please help improve this article. The discussion page may contain suggestions. |
In statistics, logistic regression is a regression model for binomially distributed response/dependent variables. It is useful for modeling the probability of an event occurring as a function of other factors. It is a generalized linear model that uses the logit as its link function.
Logistic regression is used extensively in the medical and social sciences. Other names for logistic regression used in various other application areas include logistic model, logit model, and maximum-entropy classifier.
Contents |
Overview
Logistic regression analyzes binomially distributed data of the form
where the numbers of Bernoulli trials ni are known and the probabilities of success pi are unknown. An example of this distribution is the fraction of seeds (pi) that germinate after ni are planted.
The model is then that for each trial (value of i) there is a set of explanatory/independent variables that might inform the final probability. These explanatory variables can be thought of as being in a k vector Xi and the model then takes the form
The logits of the unknown binomial probabilities (i.e., the logarithms of the odds) are modelled as a linear function of the Xi.
Note that a particular element of Xi can be set to 1 for all i to yield an intercept in the model. The unknown parameters βj are usually estimated by maximum likelihood.
The interpretation of the βj parameter estimates is as the additive effect on the log odds ratio for a unit change in the jth explanatory variable. In the case of a dichotomous explanatory variable, for instance gender, eβ is the estimate of the odds ratio of having the outcome for, say, males compared with females.
The model has an equivalent formulation as
This functional form is commonly identified as a single-layer "perceptron" or single-layer artificial neural network. A single-layer neural network computes a continuous output instead of a step function. The derivative of pi with respect to X = x1...xk is computed from the general form:
where f(X) is an analytic function in X. With this choice, the single-layer network is identical to the logistic regression model. This function has a continuous derivative, which allows it to be used in backpropagation. This function is also preferred because its derivative is easily calculated:
Extensions
Extensions of the model exist to cope with multi-category dependent variables and ordinal dependent variables, such as polytomous regression. Multi-class classification by logistic regression is known as multinomial logit modeling. An extension of the logistic model to sets of interdependent variables is the conditional random field.
Example
Let p(x) be the probability of success when the value of the predictor variable is x. Then let
Algebraic manipulation shows that
where
is the odds in favor of success.
If we take a simple example, say p(50) = 2/3, then
So when x = 50, a success is twice as likely as a failure. Or, it can be simply said that the odds are 2 to 1.
See also
- Artificial neural network
- Data mining
- Linear discriminant analysis
- Perceptron
- Probit model
- Variable rules analysis
- Jarrow-Turnbull model
External links
- Web-based logistic regression calculator
- A highly optimized Maximum Entropy modeling package
- MALLET Java library, includes a trainer for logistic models
References
- Agresti, Alan. (2002). Categorical Data Analysis. New York: Wiley-Interscience. ISBN 0-471-36093-7.
- Amemiya, T. (1985). Advanced Econometrics. Harvard University Press. ISBN 0-674-00560-0.
- Balakrishnan, N. (1991). Handbook of the Logistic Distribution. Marcel Dekker, Inc.. ISBN 978-0824785871.
- Green, William H. (2003). Econometric Analysis, fifth edition. Prentice Hall. ISBN 0-13-066189-9.
- Hosmer, David W.; Stanley Lemeshow (2000). Applied Logistic Regression, 2nd ed.. New York; Chichester, Wiley. ISBN 0-471-35632-8.
cs:Logistická regrese
de:Logistische Regression
fa:وایازی لوژستیکی
fr:Régression logistique
it:Regressione logistica
ja:ロジスティック回帰
sv:Enkel logistisk regression
Acknowledgement and Attribution Regarding Sources of Content
Some of the initial content on this page may be incorporated in part from copyleft sources in the public domain including wikis such as Wikipedia and AskDrWiki. Drug information for patients came from the The National Library of Medicine. Infectious disease information may have come from the Centers for Disease Control (CDC). Differential Diagnoses are drawn from clinicians as well as an amalgamation of 3 sources: 1.The Disease Database; 2. Kahan, Scott, Smith, Ellen G. In A Page: Signs and Symptoms. Malden, Massachusetts: Blackwell Publishing, 2004:3; 3. Sailer, Christian, Wasner, Susanne. Differential Diagnosis Pocket. Hermosa Beach, CA: Borm Bruckmeir Publishing LLC, 2002:7 .

