统计建模与R软件第六章习题…
来源:互联网 发布:mac 上不了网 编辑:程序博客网 时间:2024/06/04 23:26
原文地址:统计建模与R软件第六章习题答案(回归分析)作者:蘓木柒
Ex6.1
(1)
> x<- c(5.1, 3.5, 7.1, 6.2, 8.8, 7.8, 4.5, 5.6, 8.0,6.4)
> y<- c(1907, 1287, 2700, 2373, 3260, 3000, 1947, 2273,3113,2493)
>plot(x,y)
由此判断,Y和X有线性关系。
(2)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)
Call:
lm(formula = y ~ 1 +x)
Residuals:
Min 1Q Median 3Q Max
-128.591-70.978 -3.727 49.263 167.228
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept) 140.95 125.11 1.127 0.293
x 364.18 19.26 18.908 6.33e-08 ***
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 96.42on 8 degrees of freedom
Multiple R-squared:0.9781, Adjusted R-squared: 0.9754
F-statistic: 357.5 on 1 and 8DF, p-value: 6.33e-08
回归方程为Y=140.95+364.18X
(3)
β1项很显著,但常数项β0不显著。
回归方程很显著。
(4)
> new<- data.frame(x=7)
>lm.pred<-predict(lm.sol,new,interval="prediction")
>lm.pred
fit lwr upr
1 2690.227 2454.9712925.484
故Y(7)= 2690.227,[2454.971,2925.484]
Ex6.2
(1)
>pho<-data.frame(x1<-c(0.4,0.4,3.1,0.6,4.7,1.7,9.4,10.1,11.6,12.6,10.9,23.1,23.1,21.6,23.1,1.9,26.8,29.9),x2 <-c(52,34,19,34,24,65,44,31,29,58,37,46,50,44,56,36,58,51), x3<-c(158,163,37,157,59,123,46,117,173,112,111,114,134,73,168,143,202,124),y <-c(64,60,71,61,54,77,81,93,93,51,76,96,77,93,95,54,168,99))
>lm.sol<-lm(y~x1+x2+x3,data=pho)
>summary(lm.sol)
Call:
lm(formula = y ~ x1 + x2 + x3,data = pho)
Residuals:
Min 1Q Median 3Q Max
-27.575-11.160 -2.799 11.574 48.808
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept)44.9290 18.3408 2.450 0.02806 *
x1 1.8033 0.5290 3.409 0.00424 **
x2 -0.1337 0.4440 -0.301 0.76771
x3 0.1668 0.1141 1.462 0.16573
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 19.93on 14 degrees of freedom
Multiple R-squared:0.551, Adjusted R-squared: 0.4547
F-statistic: 5.726 on 3 and 14DF, p-value: 0.009004
回归方程为y=44.9290+1.8033x1-0.1337x2+0.1668x3
(2)
回归方程显著,但有些回归系数不显著。
(3)
>lm.step<-step(lm.sol)
Start:AIC=111.2
y ~ x1 + x2 +x3
Df Sum ofSq RSS AIC
-x2 1 36.0 5599.4 109.3
<none> 5563.4 111.2
-x3 1 849.8 6413.1 111.8
-x1 1 4617.810181.2 120.1
Step:AIC=109.32
y ~ x1 + x3
Df Sum ofSq RSS AIC
<none> 5599.4 109.3
-x3 1 833.2 6432.6 109.8
-x1 1 5169.510768.9 119.1
>summary(lm.step)
Call:
lm(formula = y ~ x1 + x3, data= pho)
Residuals:
Min 1Q Median 3Q Max
-29.713-11.324 -2.953 11.286 48.679
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept)41.4794 13.8834 2.988 0.00920 **
x1 1.7374 0.4669 3.721 0.00205 **
x3 0.1548 0.1036 1.494 0.15592
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 19.32on 15 degrees of freedom
Multiple R-squared:0.5481, Adjusted R-squared: 0.4878
F-statistic: 9.095 on 2 and 15DF, p-value: 0.002589
x3仍不够显著。
再用drop1函数做逐步回归。
>drop1(lm.step)
Single termdeletions
Model:
y ~ x1 + x3
Df Sum ofSq RSS AIC
<none> 5599.4 109.3
x1 1 5169.510768.9 119.1
x3 1 833.2 6432.6 109.8
可以考虑再去掉x3.
>lm.opt<-lm(y~x1,data=pho);summary(lm.opt)
Call:
lm(formula = y ~ x1, data =pho)
Residuals:
Min 1Q Median 3Q Max
-31.486-8.282 -1.674 5.623 59.337
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept)59.2590 7.4200 7.986 5.67e-07***
x1 1.8434 0.4789 3.849 0.00142 **
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 20.05on 16 degrees of freedom
Multiple R-squared:0.4808, Adjusted R-squared: 0.4484
F-statistic: 14.82 on 1 and 16DF, p-value: 0.001417
皆显著。
Ex6.3
>x<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,6,6,6,7,7,7,8,8,8,9,11,12,12,12)
>y<-c(0.6,1.6,0.5,1.2,2.0,1.3,2.5,2.2,2.4,1.2,3.5,4.1,5.1,5.7,3.4,9.7,8.6,4.0,5.5,10.5,17.5,13.4,4.5,30.4,12.4,13.4,26.2,7.4)
>plot(x,y)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)
Call:
lm(formula = y ~ 1 +x)
Residuals:
Min 1Q Median 3Q Max
-9.8413 -2.3369-0.0214 1.0592 17.8320
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept)-1.4519 1.8353 -0.791 0.436
x 1.5578 0.2807 5.549 7.93e-06***
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 5.168on 26 degrees of freedom
Multiple R-squared:0.5422, Adjusted R-squared: 0.5246
F-statistic:30.8 on 1 and 26 DF, p-value:7.931e-06
线性回归方程为 y=-1.4519+1.5578x,通过F检验。 常数项参数未通过t 检验。
>abline(lm.sol)
>y.yes<-resid(lm.sol)
>y.fit<-predict(lm.sol)
>y.rst<-rstandard(lm.sol)
>plot(y.yes~y.fit)
>plot(y.rst~y.fit)
残差并非是等方差的。
修正模型,对相应变量Y做开方。
>lm.new<-update(lm.sol,sqrt(.)~.)
>summary(lm.new)
Call:
lm(formula = sqrt(y) ~x)
Residuals:
Min 1Q Median 3Q Max
-1.54255 -0.45280-0.01177 0.34925 2.12486
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept)0.76650 0.25592 2.995 0.00596 **
x 0.29136 0.03914 7.444 6.64e-08***
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 0.7206on 26 degrees of freedom
Multiple R-squared:0.6806, Adjusted R-squared: 0.6684
F-statistic: 55.41 on 1 and 26DF, p-value: 6.645e-08
此时所有参数和方程均通过检验。
对新模型做标准化残差图,情况有所改善,不过还是存在一个离群值。第24和第28个值存在问题。
Ex6.4
>toothpaste<-data.frame( X1=c(-0.05,0.25,0.60,0,0.20, 0.15,-0.15, 0.15,0.10,0.40,0.45,0.35,0.30,0.50,0.50,0.40,-0.05,-0.05,-0.10,0.20,0.10,0.50,0.60,-0.05,0,0.05,0.55),X2=c(5.50,6.75,7.25,5.50,6.50,6.75,5.25,6.00,6.25,7.00,6.90,6.80,6.80,7.10,7.00,6.80,6.50,6.25,6.00,6.50,7.00,6.80,6.80,6.50,5.75,5.80,6.80),Y=c(7.38,8.51,9.52,7.50,8.28,8.75,7.10,8.00,8.15,9.10,8.86,8.90,8.87,9.26,9.00,8.75,7.95,7.65,7.27,8.00,8.50,8.75,9.21,8.27,7.67,7.93,9.26))
>lm.sol<-lm(Y~X1+X2,data=toothpaste);summary(lm.sol)
Call:
lm(formula = Y ~ X1 + X2, data= toothpaste)
Residuals:
Min 1Q Median 3Q Max
-0.37130-0.10114 0.03066 0.10016 0.30162
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept) 4.0759 0.6267 6.504 1.00e-06***
X1 1.5276 0.2354 6.489 1.04e-06***
X2 0.6138 0.1027 5.974 3.63e-06***
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 0.1767on 24 degrees of freedom
Multiple R-squared:0.9378, Adjusted R-squared: 0.9327
F-statistic: 181 on 2 and 24 DF, p-value:3.33e-15
回归诊断:
>influence.measures(lm.sol)
Influence measuresof
lm(formula = Y ~ X1 + X2, data = toothpaste) :
dfb.1_ dfb.X1 dfb.X2 dffitcov.r cook.d hatinf
1 0.00908 0.00260 -0.00847 0.01211.366 5.11e-050.1681
2 0.06277 0.04467 -0.06785 -0.1244 1.159 5.32e-030.0537
3-0.02809 0.07724 0.02540 0.1858 1.283 1.19e-020.1386
4 0.11688 0.05055 -0.11078 0.14041.377 6.83e-03 0.1843 *
5 0.01167 0.01887 -0.01766 -0.1037 1.141 3.69e-030.0384
6 -0.43010-0.42881 0.45774 0.6061 0.8141.11e-010.0936
7 0.07840 0.01534 -0.07284 0.10821.481 4.07e-03 0.2364 *
8 0.01577 0.00913 -0.01485 0.02081.237 1.50e-040.0823
9 0.01127 -0.02714 -0.00364 0.1071 1.156 3.95e-030.0466
10 -0.078300.00171 0.08052 0.1890 1.1551.22e-020.0726
11 0.00301-0.09652 -0.00365 -0.2281 1.127 1.76e-020.0735
12 -0.031140.01848 0.03459 0.1542 1.1328.12e-030.0514
13 -0.09236-0.03801 0.09940 0.2201 1.0711.62e-020.0522
14 -0.026500.03434 0.02606 0.1179 1.2354.81e-030.0956
15 0.00968-0.11445 -0.00857 -0.2545 1.150 2.19e-020.0910
16 -0.00285-0.06185 0.00098 -0.1608 1.146 8.83e-030.0594
170.07201 0.09744 -0.07796 -0.1099 1.364 4.19e-030.1731
180.15132 0.30204 -0.17755 -0.3907 1.087 5.04e-020.1085
190.07489 0.47472 -0.12980 -0.7579 0.731 1.66e-010.1092
200.05249 0.08484 -0.07940 -0.4660 0.625 6.11e-020.0384 *
210.07557 0.07284 -0.07861 -0.0880 1.471 2.69e-030.2304 *
22 -0.17959-0.39016 0.18241 -0.5494 0.912 9.41e-020.1022
230.06026 0.10607 -0.06207 0.12511.374 5.42e-030.1804
24 -0.54830-0.74197 0.59358 0.8371 0.9142.13e-010.1731
250.08541 0.01624 -0.07775 0.13141.249 5.97e-030.1069
260.32556 0.11734 -0.30200 0.44801.018 6.49e-020.1033
270.17243 0.32754 -0.17676 0.41271.148 5.66e-020.1369
>source("Reg_Diag.R");Reg_Diag(lm.sol) #薛毅老师自己写的程序
residual s1 standards2 student s3 hat_matrixs4 DFFITS s5
1 0.00443843 0.02753865 0.02695925 0.16811819 0.01211949
2-0.09114255 -0.53021138 -0.52211469 0.05369239 -0.12436727
3 0.07726887 0.47112863 0.46335666 0.13857353 0.18584310
4 0.04805665 0.30111062 0.29532912 0.18427663 0.14036860
5-0.09130271 -0.52689847 -0.51881406 0.03838430 -0.10365442
6 0.30162101 1.79287913 1.88596579 0.09362223 0.60613406
7 0.03066005 0.19855842 0.19453763 0.23641540 * 0.10824626
8 0.01199519 0.07085860 0.06937393 0.08226537 0.02077047
9 0.08491891 0.49217591 0.48426323 0.04664158 0.10711246
100.11625405 0.68315814 0.67537315 0.07261134 0.18897969
11-0.13874451 -0.81570765 -0.80983786 0.07348894 -0.22807820
120.11540228 0.67051940 0.66263761 0.05137589 0.15420864
130.16178406 0.94041623 0.93806144 0.05219432 0.22013204
140.06210727 0.36957277 0.36282531 0.09557411 0.11794546
15-0.13650951 -0.81026658 -0.80428349 0.09101221 -0.25449541
16-0.11097950 -0.64757782 -0.63955524 0.05943308 -0.16076716
17-0.03939381 -0.24515626 -0.24029557 0.17309048 -0.10993940
18-0.18593575 -1.11438446 -1.12029013 0.10845395 -0.39073410
19-0.33609591 -2.01522068 * -2.16439284 *0.10922236 -0.75789180 *
20-0.37130271 * -2.14274943 *-2.33258738 *0.03838430 -0.46603012
21-0.02545527 -0.16420856 -0.16084153 0.23042354 *-0.08801075
22-0.26374306 -1.57517595 -1.62848498 0.10217431 -0.54936198
230.04349338 0.27187605 0.26656251 0.18041800 0.12506702
240.28060619 1.74627363 1.82969510 0.17309048 0.83711731 *
250.06459859 0.38683016 0.37987153 0.10691352 0.13143357
260.21752520 1.29995371 1.31989945 0.10329116 0.44796770
270.16987516 1.03474390 1.03633781 0.13685835 0.41266341
cooks_distance s6 COVRATIO s7
1 5.108777e-05 1.3656752
2 5.316885e-03 1.1589547
3 1.190200e-02 1.2827036
4 6.827446e-03 1.3771332
5 3.693897e-03 1.1410104
6 1.106753e-01 0.8143179
7 4.068871e-03 1.4806452 *
8 1.500251e-04 1.2372586
9 3.950358e-03 1.1560508
10 1.218047e-02 1.1550557
11 1.759216e-02 1.1271148
12 8.116460e-03 1.1316638
13 1.623390e-02 1.0710597
14 4.811117e-03 1.2349272
15 2.191171e-02 1.1501502
16 8.832858e-03 1.1457602
17 4.193532e-03 1.3637206
18 5.035591e-02 1.0866343
19 1.659840e-01 0.7313914
20 6.109050e-02 0.6248838
21 2.691197e-03 1.4714103
22 9.412101e-02 0.9121786
23 5.423856e-03 1.3735324
24 2.127740e-01 *0.9139942
25 5.971157e-03 1.2485557
26 6.488529e-02 1.0178195
27 5.658922e-02 1.1479080
为什么两种方法检测结果不一样呢...不继续了
Ex6.5
>cement<-data.frame(X1=c( 7, 1, 11,11, 7, 11, 3, 1, 2, 21, 1, 11, 10),X2=c(26,29, 56, 31, 52, 55, 71, 31, 54, 47, 40, 66, 68),X3=c( 6,15, 8, 8, 6, 9, 17, 22, 18, 4,23, 9, 8),X4=c(60, 52, 20, 47,33, 22, 6, 44, 22, 26, 34, 12, 12),Y =c(78.5,74.3, 104.3, 87.6, 95.9, 109.2,102.7, 72.5, 93.1,115.9, 83.8, 113.3,109.4))
>xx<-cor(cement[1:4])
>kappa(xx,exact=T)
[1] 1376.881
大于1000,认为有严重的多重共线性。
>eigen(xx)
$values
[1] 2.235704035 1.5760660700.186606149 0.001623746
$vectors
[,1] [,2] [,3] [,4]
[1,]-0.4759552 0.5089794 0.67550020.2410522
[2,] -0.5638702 -0.4139315-0.3144204 0.6417561
[3,]0.3940665 -0.6049691 0.63769110.2684661
[4,]0.5479312 0.4512351 -0.19542100.6767340
即2.235704035X1+1.576066070X2+0.186606149X3+0.001623746X4=0
可以忽略X4项,可以看出X1,X2,X3存在共线性。
删去X3和X4后:
>xx<-cor(cement[1:2])
>kappa(xx,exact=T)
[1] 1.59262
共线性消失了。
如果去掉X3呢?
>cement1<-cbind(cement$X1,cement$X2,cement$X4)
>xx<-cor(cement1)
>kappa(xx,exact=T)
[1] 77.25113
如果去掉X4呢?
>xx<-cor(cement[1:3])
>kappa(xx,exact=T)
[1] 11.12112
看起来去掉X3和X4是合理的。
Ex6.6
>infection<-data.frame(x1<-c(0,1,0,0,0,1,1,1),x2<-c(0,0,1,0,1,0,1,1),x3<-c(0,0,0,1,1,1,0,1),success<-c(0,0,23,8,28,0,11,1),fail<-c(9,0,3,32,30,2,87,17))
>infection$Ymat<-cbind(infection$success,infection$fail)
>glm.sol<-glm(Ymat~x1+x2+x3,family=binomial,data=infection)
>summary(glm.sol)
Call:
glm(formula = Ymat ~ x1 + x2 +x3, family = binomial, data = infection)
DevianceResiduals:
1 2 3 4 5 6 7 8
-2.56229 0.00000 1.49623 1.21563 -0.78520 -0.15231 -0.07162 0.26470
Coefficients:
Estimate Std. Error z valuePr(>|z|)
(Intercept)-0.8207 0.4947 -1.659 0.0971 .
x1 -3.2544 0.4813 -6.761 1.37e-11 ***
x2 2.0299 0.4553 4.459 8.25e-06***
x3 -1.0720 0.4254 -2.520 0.0117 *
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
(Dispersion parameter forbinomial family taken to be 1)
Null deviance: 83.491 on 6 degrees of freedom
Residual deviance:10.997 on 3 degrees offreedom
AIC: 36.178
Number of Fisher Scoringiterations: 5
回归模型为
P=exp(-0.8207-3.2544x1+2.0299x2-1.0720x3)/(1+exp(-0.8207-3.2544x1+2.0299x2-1.0720x3))
Ex6.7
(1)
>x<-c(rep(0:6,rep(2,7)))
>y<-c(508.1,498.4,568.2,577.3,651.7,657.0,713.4,697.5,755.3,758.9,787.6,792.1,841.4,831.8)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)
Call:
lm(formula = y ~ 1 +x)
Residuals:
Min 1Q Median 3Q Max
-25.400-11.484 -3.779 14.629 24.921
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept)523.800 8.474 61.81 < 2e-16 ***
x 54.893 2.350 23.36 2.26e-11***
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 17.59on 12 degrees of freedom
Multiple R-squared:0.9785, Adjusted R-squared: 0.9767
F-statistic: 545.5 on 1 and 12DF, p-value: 2.265e-11
线性回归模型为y=523.800+54.893x,通过t检验和F检验。
(2)
>lm.sol<-lm(y~1+x+I(x^2));summary(lm.sol)
Call:
lm(formula = y ~ 1 + x +I(x^2))
Residuals:
Min 1Q Median 3Q Max
-10.6643-5.6622 -0.4655 5.5000 10.6679
Coefficients:
Estimate Std. Error t valuePr(>|t|)
(Intercept)502.5560 4.8500 103.619 < 2e-16***
x 80.3857 3.7861 21.232 2.81e-10 ***
I(x^2) -4.2488 0.6063 -7.008 2.25e-05 ***
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
Residual standard error: 7.858on 11 degrees of freedom
Multiple R-squared:0.9961, Adjusted R-squared: 0.9953
F-statistic:1391 on 2 and 11 DF, p-value:5.948e-14
多项式回归模型为:
y=502.5560+80.3857x-4.2488x^2,通过t检验和F检验。
(3)作散点图和拟合曲线:
>plot(x,y)
>xfit<-seq(0,6,0.01)
>yfit<-predict(lm.sol,data.frame(x=xfit))
>lines(xfit,yfit)
Ex6.8
读入数据:
>cancer<-read.table("data",header=T)
>cancer
x1 x2x3 x4 x5 y
1 7064 5 1 11
2 6063 9 1 10
3 70 6511 1 1 0
4 40 6910 1 1 0
5 40 6358 1 1 0
6 7048 9 1 10
7 70 4811 1 1 0
8 8063 4 2 10
9 60 6314 2 1 0
10 30 534 2 1 0
11 80 43 122 1 0
12 40 552 2 1 0
13 60 66 252 1 1
14 40 67 232 1 0
15 20 61 193 1 0
16 50 634 3 1 0
17 50 66 160 1 0
18 40 68 120 1 0
19 80 41 120 1 1
20 70 538 0 1 1
21 60 37 131 1 0
22 90 54 121 0 1
23 50 528 1 0 1
24 70 507 1 0 1
25 20 65 211 0 0
26 80 52 281 0 1
27 60 70 131 0 0
28 50 40 131 0 0
29 70 36 222 0 0
30 40 44 362 0 0
31 30 549 2 0 0
32 30 59 872 0 0
33 40 695 3 0 0
34 60 50 223 0 0
35 80 624 3 0 0
36 70 68 150 0 0
37 30 394 0 0 0
38 60 49 110 0 0
39 80 64 100 0 1
40 70 67 180 0 1
>glm.sol<-glm(y~x1+x2+x3+x4+x5,family=binomial,data=cancer);summary(glm.sol)
Call:
glm(formula = y ~ x1 + x2 + x3+ x4 + x5, family = binomial,
data = cancer)
DevianceResiduals:
Min 1Q Median 3Q Max
-1.71500-0.66725 -0.22254 0.09936 2.23936
Coefficients:
Estimate Std. Error z valuePr(>|z|)
(Intercept)-7.01140 4.47534 -1.567 0.1172
x1 0.09994 0.04304 2.322 0.0202 *
x2 0.01415 0.04697 0.301 0.7631
x3 0.01749 0.05458 0.320 0.7486
x4 -1.08297 0.58721 -1.844 0.0651.
x5 -0.61309 0.96066 -0.638 0.5233
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
(Dispersion parameter forbinomial family taken to be 1)
Null deviance: 44.987 on 39 degrees of freedom
Residual deviance:28.392 on 34 degrees offreedom
AIC: 40.392
Number of Fisher Scoringiterations: 6
有的系数并不显著。
下面做逐步回归:
>glm.new<-step(glm.sol)
Start:AIC=40.39
y ~ x1 + x2 + x3 + x4 +x5
Df Deviance AIC
-x3 1 28.48438.484
-x2 1 28.48438.484
-x5 1 28.79938.799
<none> 28.392 40.392
-x4 1 32.64242.642
-x1 1 38.30648.306
Step:AIC=38.48
y ~ x1 + x2 + x4 +x5
Df Deviance AIC
-x2 1 28.56436.564
-x5 1 28.99336.993
<none> 28.484 38.484
-x4 1 32.70540.705
-x1 1 38.47846.478
Step:AIC=36.56
y ~ x1 + x4 +x5
Df Deviance AIC
-x5 1 29.07335.073
<none> 28.564 36.564
-x4 1 32.89238.892
-x1 1 38.47844.478
Step:AIC=35.07
y ~ x1 + x4
Df Deviance AIC
<none> 29.073 35.073
-x4 1 33.53537.535
-x1 1 39.13143.131
只留下x1和x4两个变量。
>summary(glm.new)
Call:
glm(formula = y ~ x1 + x4,family = binomial, data = cancer)
DevianceResiduals:
Min 1Q Median 3Q Max
-1.4825-0.6617 -0.1877 0.1227 2.2844
Coefficients:
Estimate Std. Error z valuePr(>|z|)
(Intercept)-6.13755 2.73844 -2.241 0.0250*
x1 0.09759 0.04079 2.393 0.0167 *
x4 -1.12524 0.60239 -1.868 0.0618.
---
Signif.codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1
(Dispersion parameter forbinomial family taken to be 1)
Null deviance: 44.987 on 39 degrees of freedom
Residual deviance:29.073 on 37 degrees offreedom
AIC: 35.073
Number of Fisher Scoringiterations: 6
回归方程为
P=exp(-6.13755+0.09759x1-1.12524x4)/(1+exp(-6.13755+0.09759x1-1.12524x4))
概率估计略。
Ex6.9
我表示不想做了...
我没弄明白nls()函数里所说的start即初始值怎么设置。好像可以随便设置,只要保证函数收敛即可??
(1)
> x<- c(5.1, 3.5, 7.1, 6.2, 8.8, 7.8, 4.5, 5.6, 8.0,6.4)
> y<- c(1907, 1287, 2700, 2373, 3260, 3000, 1947, 2273,3113,2493)
>plot(x,y)
由此判断,Y和X有线性关系。
(2)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)
Call:
lm(formula = y ~ 1 +x)
Residuals:
-128.591
Coefficients:
(Intercept)
x
---
Signif.codes:
Residual standard error: 96.42on 8 degrees of freedom
Multiple R-squared:0.9781,
F-statistic: 357.5 on 1 and 8DF,
回归方程为Y=140.95+364.18X
(3)
β1项很显著,但常数项β0不显著。
回归方程很显著。
(4)
> new<- data.frame(x=7)
>lm.pred<-predict(lm.sol,new,interval="prediction")
>lm.pred
1 2690.227 2454.9712925.484
故Y(7)= 2690.227,[2454.971,2925.484]
Ex6.2
(1)
>pho<-data.frame(x1<-c(0.4,0.4,3.1,0.6,4.7,1.7,9.4,10.1,11.6,12.6,10.9,23.1,23.1,21.6,23.1,1.9,26.8,29.9),x2 <-c(52,34,19,34,24,65,44,31,29,58,37,46,50,44,56,36,58,51), x3<-c(158,163,37,157,59,123,46,117,173,112,111,114,134,73,168,143,202,124),y <-c(64,60,71,61,54,77,81,93,93,51,76,96,77,93,95,54,168,99))
>lm.sol<-lm(y~x1+x2+x3,data=pho)
>summary(lm.sol)
Call:
lm(formula = y ~ x1 + x2 + x3,data = pho)
Residuals:
-27.575-11.160
Coefficients:
(Intercept)
x1
x2
x3
---
Signif.codes:
Residual standard error: 19.93on 14 degrees of freedom
Multiple R-squared:0.551,
F-statistic: 5.726 on 3 and 14DF,
回归方程为y=44.9290+1.8033x1-0.1337x2+0.1668x3
(2)
回归方程显著,但有些回归系数不显著。
(3)
>lm.step<-step(lm.sol)
Start:
y ~ x1 + x2 +x3
-x2
<none>
-x3
-x1
Step:
y ~ x1 + x3
<none>
-x3
-x1
>summary(lm.step)
Call:
lm(formula = y ~ x1 + x3, data= pho)
Residuals:
-29.713-11.324
Coefficients:
(Intercept)
x1
x3
---
Signif.codes:
Residual standard error: 19.32on 15 degrees of freedom
Multiple R-squared:0.5481,
F-statistic: 9.095 on 2 and 15DF,
x3仍不够显著。
再用drop1函数做逐步回归。
>drop1(lm.step)
Single termdeletions
Model:
y ~ x1 + x3
<none>
x1
x3
可以考虑再去掉x3.
>lm.opt<-lm(y~x1,data=pho);summary(lm.opt)
Call:
lm(formula = y ~ x1, data =pho)
Residuals:
-31.486
Coefficients:
(Intercept)
x1
---
Signif.codes:
Residual standard error: 20.05on 16 degrees of freedom
Multiple R-squared:0.4808,
F-statistic: 14.82 on 1 and 16DF,
皆显著。
Ex6.3
>x<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,6,6,6,7,7,7,8,8,8,9,11,12,12,12)
>y<-c(0.6,1.6,0.5,1.2,2.0,1.3,2.5,2.2,2.4,1.2,3.5,4.1,5.1,5.7,3.4,9.7,8.6,4.0,5.5,10.5,17.5,13.4,4.5,30.4,12.4,13.4,26.2,7.4)
>plot(x,y)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)
Call:
lm(formula = y ~ 1 +x)
Residuals:
-9.8413 -2.3369-0.0214
Coefficients:
(Intercept)
x
---
Signif.codes:
Residual standard error: 5.168on 26 degrees of freedom
Multiple R-squared:0.5422,
F-statistic:
线性回归方程为 y=-1.4519+1.5578x,通过F检验。 常数项参数未通过t 检验。
>abline(lm.sol)
>y.yes<-resid(lm.sol)
>y.fit<-predict(lm.sol)
>y.rst<-rstandard(lm.sol)
>plot(y.yes~y.fit)
>plot(y.rst~y.fit)
残差并非是等方差的。
修正模型,对相应变量Y做开方。
>lm.new<-update(lm.sol,sqrt(.)~.)
>summary(lm.new)
Call:
lm(formula = sqrt(y) ~x)
Residuals:
-1.54255 -0.45280-0.01177
Coefficients:
(Intercept)
x
---
Signif.codes:
Residual standard error: 0.7206on 26 degrees of freedom
Multiple R-squared:0.6806,
F-statistic: 55.41 on 1 and 26DF,
此时所有参数和方程均通过检验。
对新模型做标准化残差图,情况有所改善,不过还是存在一个离群值。第24和第28个值存在问题。
Ex6.4
>toothpaste<-data.frame( X1=c(-0.05,0.25,0.60,0,0.20, 0.15,-0.15, 0.15,0.10,0.40,0.45,0.35,0.30,0.50,0.50,0.40,-0.05,-0.05,-0.10,0.20,0.10,0.50,0.60,-0.05,0,0.05,0.55),X2=c(5.50,6.75,7.25,5.50,6.50,6.75,5.25,6.00,6.25,7.00,6.90,6.80,6.80,7.10,7.00,6.80,6.50,6.25,6.00,6.50,7.00,6.80,6.80,6.50,5.75,5.80,6.80),Y=c(7.38,8.51,9.52,7.50,8.28,8.75,7.10,8.00,8.15,9.10,8.86,8.90,8.87,9.26,9.00,8.75,7.95,7.65,7.27,8.00,8.50,8.75,9.21,8.27,7.67,7.93,9.26))
>lm.sol<-lm(Y~X1+X2,data=toothpaste);summary(lm.sol)
Call:
lm(formula = Y ~ X1 + X2, data= toothpaste)
Residuals:
-0.37130-0.10114
Coefficients:
(Intercept)
X1
X2
---
Signif.codes:
Residual standard error: 0.1767on 24 degrees of freedom
Multiple R-squared:0.9378,
F-statistic:
回归诊断:
>influence.measures(lm.sol)
Influence measuresof
1
2
3
4
5
6
7
8
9
10 -0.07830
11
12 -0.03114
13 -0.09236-0.03801
14 -0.02650
15
16 -0.00285-0.06185
17
18
19
20
21
22 -0.17959-0.39016
23
24 -0.54830-0.74197
25
26
27
>source("Reg_Diag.R");Reg_Diag(lm.sol) #薛毅老师自己写的程序
1
2
3
4
5
6
7
8
9
10
11-0.13874451
12
13
14
15-0.13650951
16-0.11097950
17-0.03939381
18-0.18593575
19-0.33609591
20-0.37130271
21-0.02545527
22-0.26374306
23
24
25
26
27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
为什么两种方法检测结果不一样呢...不继续了
Ex6.5
>cement<-data.frame(X1=c( 7,
>xx<-cor(cement[1:4])
>kappa(xx,exact=T)
[1] 1376.881
大于1000,认为有严重的多重共线性。
>eigen(xx)
$values
[1] 2.235704035 1.5760660700.186606149 0.001623746
$vectors
[1,]-0.4759552
[2,] -0.5638702 -0.4139315-0.3144204 0.6417561
[3,]
[4,]
即2.235704035X1+1.576066070X2+0.186606149X3+0.001623746X4=0
可以忽略X4项,可以看出X1,X2,X3存在共线性。
删去X3和X4后:
>xx<-cor(cement[1:2])
>kappa(xx,exact=T)
[1] 1.59262
共线性消失了。
如果去掉X3呢?
>cement1<-cbind(cement$X1,cement$X2,cement$X4)
>xx<-cor(cement1)
>kappa(xx,exact=T)
[1] 77.25113
如果去掉X4呢?
>xx<-cor(cement[1:3])
>kappa(xx,exact=T)
[1] 11.12112
看起来去掉X3和X4是合理的。
Ex6.6
>infection<-data.frame(x1<-c(0,1,0,0,0,1,1,1),x2<-c(0,0,1,0,1,0,1,1),x3<-c(0,0,0,1,1,1,0,1),success<-c(0,0,23,8,28,0,11,1),fail<-c(9,0,3,32,30,2,87,17))
>infection$Ymat<-cbind(infection$success,infection$fail)
>glm.sol<-glm(Ymat~x1+x2+x3,family=binomial,data=infection)
>summary(glm.sol)
Call:
glm(formula = Ymat ~ x1 + x2 +x3, family = binomial, data = infection)
DevianceResiduals:
-2.56229
Coefficients:
(Intercept)
x1
x2
x3
---
Signif.codes:
(Dispersion parameter forbinomial family taken to be 1)
Residual deviance:10.997
AIC: 36.178
Number of Fisher Scoringiterations: 5
回归模型为
P=exp(-0.8207-3.2544x1+2.0299x2-1.0720x3)/(1+exp(-0.8207-3.2544x1+2.0299x2-1.0720x3))
Ex6.7
(1)
>x<-c(rep(0:6,rep(2,7)))
>y<-c(508.1,498.4,568.2,577.3,651.7,657.0,713.4,697.5,755.3,758.9,787.6,792.1,841.4,831.8)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)
Call:
lm(formula = y ~ 1 +x)
Residuals:
-25.400-11.484
Coefficients:
(Intercept)
x
---
Signif.codes:
Residual standard error: 17.59on 12 degrees of freedom
Multiple R-squared:0.9785,
F-statistic: 545.5 on 1 and 12DF,
线性回归模型为y=523.800+54.893x,通过t检验和F检验。
(2)
>lm.sol<-lm(y~1+x+I(x^2));summary(lm.sol)
Call:
lm(formula = y ~ 1 + x +I(x^2))
Residuals:
-10.6643
Coefficients:
(Intercept)502.5560
x
I(x^2)
---
Signif.codes:
Residual standard error: 7.858on 11 degrees of freedom
Multiple R-squared:0.9961,
F-statistic:
多项式回归模型为:
y=502.5560+80.3857x-4.2488x^2,通过t检验和F检验。
(3)作散点图和拟合曲线:
>plot(x,y)
>xfit<-seq(0,6,0.01)
>yfit<-predict(lm.sol,data.frame(x=xfit))
>lines(xfit,yfit)
Ex6.8
读入数据:
>cancer<-read.table("data",header=T)
>cancer
1
2
3
4
5
6
7
8
9
10 30 53
11 80 43 12
12 40 55
13 60 66 25
14 40 67 23
15 20 61 19
16 50 63
17 50 66 16
18 40 68 12
19 80 41 12
20 70 53
21 60 37 13
22 90 54 12
23 50 52
24 70 50
25 20 65 21
26 80 52 28
27 60 70 13
28 50 40 13
29 70 36 22
30 40 44 36
31 30 54
32 30 59 87
33 40 69
34 60 50 22
35 80 62
36 70 68 15
37 30 39
38 60 49 11
39 80 64 10
40 70 67 18
>glm.sol<-glm(y~x1+x2+x3+x4+x5,family=binomial,data=cancer);summary(glm.sol)
Call:
glm(formula = y ~ x1 + x2 + x3+ x4 + x5, family = binomial,
DevianceResiduals:
-1.71500
Coefficients:
(Intercept)-7.01140
x1
x2
x3
x4
x5
---
Signif.codes:
(Dispersion parameter forbinomial family taken to be 1)
Residual deviance:28.392
AIC: 40.392
Number of Fisher Scoringiterations: 6
有的系数并不显著。
下面做逐步回归:
>glm.new<-step(glm.sol)
Start:
y ~ x1 + x2 + x3 + x4 +x5
-x3
-x2
-x5
<none>
-x4
-x1
Step:
y ~ x1 + x2 + x4 +x5
-x2
-x5
<none>
-x4
-x1
Step:
y ~ x1 + x4 +x5
-x5
<none>
-x4
-x1
Step:
y ~ x1 + x4
<none>
-x4
-x1
只留下x1和x4两个变量。
>summary(glm.new)
Call:
glm(formula = y ~ x1 + x4,family = binomial, data = cancer)
DevianceResiduals:
-1.4825
Coefficients:
(Intercept)-6.13755
x1
x4
---
Signif.codes:
(Dispersion parameter forbinomial family taken to be 1)
Residual deviance:29.073
AIC: 35.073
Number of Fisher Scoringiterations: 6
回归方程为
P=exp(-6.13755+0.09759x1-1.12524x4)/(1+exp(-6.13755+0.09759x1-1.12524x4))
概率估计略。
Ex6.9
我表示不想做了...
我没弄明白nls()函数里所说的start即初始值怎么设置。好像可以随便设置,只要保证函数收敛即可??
0 0
- 统计建模与R软件第六章习题…
- 统计建模与R软件第三章习题…
- 统计建模与R软件第二章习题…
- 统计建模与R软件第四章习题…
- 统计建模与R软件第五章习题…
- 统计建模与R软件第七章习题…
- 统计建模与R软件-第三章习题答案
- 《统计建模与R软件》练习答案-第2章
- 统计建模与R软件第二章作业
- 统计建模与R软件(绪论)
- 『原创』统计建模与R软件-第四章 参数估计
- 第六章 统计与分组
- 统计软件R
- R统计软件资源
- R软件的使用-习题
- 统计软件第六次作业
- 第六章习题汇总
- C++第六章 习题
- C语言const:禁止修改变量的值
- 统计建模与R软件第二章习题…
- 统计建模与R软件第四章习题…
- 隐马尔科夫模型
- 统计建模与R软件第五章习题…
- 统计建模与R软件第六章习题…
- ROS:两个节点同时具有发布和订阅图像信息的功能
- 统计建模与R软件第七章习题…
- 上海市劳动和社会保障局关于建立人…
- 微信小程序-第一章-简述
- ftl实现文档WORD导出
- Sublime Text 3 编译并运行Java程序
- 《劳动力市场工资指导价位制度和企…
- Linux目录结构