chapter 4 exercise

来源：互联网发布：广告制作软件编辑：程序博客网时间：2024/04/30 00:26

problem13
对Boston数据集拟合分类模型来预测郊区犯罪率高于中位数还是你低于中位数。

Boston$c.crim <- (Boston$crim > median(Boston$crim))#随机拆分数据集set.seed(122)rands <- rnorm(nrow(Boston))test <- (rands > quantile(rands,0.75))train <- !testBoston.train <- Boston[train,]Boston.test <- Boston[test,]#分类变量的训练数据集Boston.train$crim<- factor(as.numeric(Boston.train$c.crim))head(Boston.train)

#逻辑斯蒂回归logit.fit <- glm(crim~.-c.crim,data=Boston.train,family=binomial)summary(logit.fit)Call:glm(formula = crim ~ . - c.crim, family = binomial, data = Boston.train)Deviance Residuals:     Min       1Q   Median       3Q      Max  -2.3717  -0.1638  -0.0050   0.0027   3.5013  Coefficients:              Estimate Std. Error z value Pr(>|z|)    (Intercept) -31.585466   7.570606  -4.172 3.02e-05 ***zn           -0.056584   0.039644  -1.427 0.153492    indus        -0.073904   0.050582  -1.461 0.143995    chas          1.479926   0.920073   1.608 0.107729    nox          49.194715   9.059145   5.430 5.62e-08 ***rm           -0.764723   0.769264  -0.994 0.320176    age           0.042474   0.014863   2.858 0.004269 ** dis           0.589066   0.259613   2.269 0.023268 *  rad           0.611718   0.175804   3.480 0.000502 ***tax          -0.007147   0.003233  -2.211 0.027059 *  ptratio       0.344830   0.140583   2.453 0.014172 *  black        -0.012981   0.006615  -1.962 0.049732 *  lstat        -0.040590   0.056146  -0.723 0.469723    medv          0.155290   0.077391   2.007 0.044795 *  ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)    Null deviance: 523.76  on 378  degrees of freedomResidual deviance: 160.87  on 365  degrees of freedomAIC: 188.87Number of Fisher Scoring iterations: 9

glm.probs=predict(logit.fit,Boston.test,type="response")glm.pred=rep(0,nrow(Boston.test))glm.pred[glm.probs > 0.50]=1#混淆矩阵table(glm.pred,Boston.test$crim01)glm.pred  0  1       0 49  7       1  2 69#计算整体预算准确率mean(glm.pred==Boston.test$crim01)[1] 0.9291339

LDA模型

“`
lda.fit=lda(crim~nox+rad+medv+age+tax+ptratio, data=Boston.train)
lda.fit

Call:
lda(crim ~ nox + rad + medv + age + tax + ptratio, data = Boston.train)

Prior probabilities of groups:
0 1
0.5329815 0.4670185

Group means:
nox rad medv age tax ptratio
0 0.4729441 4.183168 24.58069 50.94010 309.7475 17.93614
1 0.6347175 14.559322 20.13729 86.69944 503.7062 18.90226

Coefficients of linear discriminants:
LD1
nox 8.247306805
rad 0.087278767
medv 0.030474664
age 0.016015886
tax -0.001093165
ptratio 0.028299344

lda.pred=predict(lda.fit,Boston.test)classtable(lda.pred,Boston.testcrim)

lda.pred 0 1
0 50 18
1 1 58

mean(lda.pred==Boston.test$crim)
[1] 0.8503937

0 0