predict.plot(y~x,frame)
p(y=1|x) = a + b*xThis doesn't work because for some x it would give probabilities above 1 or below 0. The solution is to change to a different representation of probability. We could use the odds representation: p/(1-p), which ranges from 0 to infinity. That gets rid of the upper limit. We could use a log representation: log(p), which ranges from -infinity to 0. To get rid of both limits, we can use log-odds:
log(p(y=1|x)/(1-p(y=1|x))) = a + b*xSolving for p gives
p(y=1|x) = 1/(1 + exp(-(a+b*x)))This function is called the logistic function and denoted sigma(a+b*x). This is called a generalized linear model because we've applied a transformation to a linear function. To fit this model to data, use glm instead of lm and specify a binomial family:
> fit <- glm(y~x,frame,family=binomial) > fit Coefficients: (Intercept) x -7.565 14.137 > model.plot(fit)
> predict(fit,data.frame(x=0.5),type="response") [1] 0.3783538 > 1/(1 + exp(-(14.137*0.5 - 7.565))) [1] 0.3783635
How is logistic regression done? With a numerical response, we minimized the difference between the predicted and actual values. Here it is more tricky, because we don't observe p(y=1|x) so we can't measure distance from a target. Instead what is typically done is the method of maximum likelihood, which says to maximize
sum_i log sigma(yi*(a + b*xi))where yi is the observed response, 1 or -1. It is equivalent to minimizing the deviance on the training set. This criterion does have some problems. We will talk about the properties of this criterion in the next lecture.
Because logistic regression doesn't have a specific target function, it is difficult to diagnose the fit. It is possible to define "residuals" for logistic regression but they are difficult to interpret. Consequently we will have to treat logistic regression mostly as a "black box" in this course.
p(y=1|x) = sigma(a + b1*x1 + b2*x2 + ...)Consider the iris data from , reduced to the two classes versicolor and virginica:
data(iris) iris <- iris[(iris$Species != "setosa"),] iris$Species <- factor(iris$Species) predict.plot(Species~.,iris)
> fit <- glm(Species~.,iris,family=binomial) > summary(fit) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -42.638 25.643 -1.663 0.0964 . Sepal.Length -2.465 2.393 -1.030 0.3029 Sepal.Width -6.681 4.473 -1.494 0.1353 Petal.Length 9.429 4.725 1.995 0.0460 * Petal.Width 18.286 9.723 1.881 0.0600 .Let's do a regression with only two predictors and display it using cplot:
fit <- glm(Species~Petal.Width+Petal.Length,iris,family=binomial) cplot(fit)
> predict.plot(type~.,x) > fit <- glm(type~.,x,family=binomial) > summary(fit) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -9.773055 1.769454 -5.523 3.33e-08 *** npreg 0.103183 0.064677 1.595 0.11063 glu 0.032117 0.006784 4.734 2.20e-06 *** bp -0.004768 0.018535 -0.257 0.79701 skin -0.001917 0.022492 -0.085 0.93209 bmi 0.083624 0.042813 1.953 0.05079 . ped 1.820409 0.665225 2.737 0.00621 ** age 0.041184 0.022085 1.865 0.06221 .Several of the coefficients are not significantly different from zero, implying the predictors are not relevant. To automatically prune them away, we can use step:
> fit2 <- step(fit) Step: AIC= 190.47 type ~ npreg + glu + bmi + ped + age Df Deviance AICThis gets us down to five predictors. From the last iteration of step we see that glu and ped are the most important, in the sense that removing them causes the largest increase in deviance. Let's look at what is happening between those two:178.47 190.47 - npreg 1 181.08 191.08 - age 1 182.03 192.03 - bmi 1 184.74 194.74 - ped 1 186.55 196.55 - glu 1 206.06 216.06 > summary(fit2) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -9.938053 1.540586 -6.451 1.11e-10 *** npreg 0.103142 0.064502 1.599 0.10981 glu 0.031809 0.006665 4.773 1.82e-06 *** bmi 0.079672 0.032638 2.441 0.01464 * ped 1.811415 0.660793 2.741 0.00612 ** age 0.039286 0.020962 1.874 0.06091 .
fit <- glm(type~glu+ped,x,family=binomial) cplot(fit)
According to cross-validation, the best sized tree makes a single cut at glu=123.5. The deviance of this tree on the test set is 367, while logistic regression gets 318.