统计建模与R软件第六章习题…

来源:互联网 发布:mac 上不了网 编辑:程序博客网 时间:2024/06/04 23:26
原文地址:统计建模与R软件第六章习题答案(回归分析)作者:蘓木柒
Ex6.1
(1)
> x<- c(5.1, 3.5, 7.1, 6.2, 8.8, 7.8, 4.5, 5.6, 8.0,6.4)
> y<- c(1907, 1287, 2700, 2373, 3260, 3000, 1947, 2273,3113,2493)
>plot(x,y)
[转载]统计建模与R软件第六章习题答案(回归分析)
由此判断,Y和X有线性关系。
(2)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)

Call:
lm(formula = y ~ 1 +x)

Residuals:
    Min      1Q  Median      3Q     Max
-128.591 -70.978  -3.727   49.263 167.228

Coefficients:
           Estimate Std. Error t valuePr(>|t|)   
(Intercept)  140.95    125.11  1.127   0.293   
           364.18     19.26  18.908 6.33e-08 ***
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 96.42on 8 degrees of freedom
Multiple R-squared:0.9781,    Adjusted R-squared: 0.9754
F-statistic: 357.5 on 1 and 8DF,  p-value: 6.33e-08
回归方程为Y=140.95+364.18X
(3)
β1项很显著,但常数项β0不显著。
回归方程很显著。
(4)
> new<- data.frame(x=7)
>lm.pred<-predict(lm.sol,new,interval="prediction")
>lm.pred
      fit     lwr     upr
1 2690.227 2454.9712925.484
故Y(7)= 2690.227,[2454.971,2925.484]

Ex6.2
(1)
>pho<-data.frame(x1<-c(0.4,0.4,3.1,0.6,4.7,1.7,9.4,10.1,11.6,12.6,10.9,23.1,23.1,21.6,23.1,1.9,26.8,29.9),x2 <-c(52,34,19,34,24,65,44,31,29,58,37,46,50,44,56,36,58,51), x3<-c(158,163,37,157,59,123,46,117,173,112,111,114,134,73,168,143,202,124),y <-c(64,60,71,61,54,77,81,93,93,51,76,96,77,93,95,54,168,99))
>lm.sol<-lm(y~x1+x2+x3,data=pho)
>summary(lm.sol)

Call:
lm(formula = y ~ x1 + x2 + x3,data = pho)

Residuals:
   Min     1Q Median     3Q    Max
-27.575-11.160  -2.799 11.574  48.808

Coefficients:
           Estimate Std. Error t valuePr(>|t|)  
(Intercept) 44.9290   18.3408   2.450 0.02806 *
x1           1.8033    0.5290   3.409 0.00424 **
x2          -0.1337    0.4440  -0.301 0.76771  
x3           0.1668    0.1141   1.462 0.16573  
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 19.93on 14 degrees of freedom
Multiple R-squared:0.551,     Adjusted R-squared: 0.4547
F-statistic: 5.726 on 3 and 14DF,  p-value: 0.009004
回归方程为y=44.9290+1.8033x1-0.1337x2+0.1668x3
(2)
回归方程显著,但有些回归系数不显著。
(3)
>lm.step<-step(lm.sol)
Start: AIC=111.2
y ~ x1 + x2 +x3

      Df Sum ofSq    RSS    AIC
-x2       36.0  5599.4  109.3
<none>              5563.4   111.2
-x3      849.8  6413.1  111.8
-x1      4617.810181.2   120.1

Step: AIC=109.32
y ~ x1 + x3

      Df Sum ofSq    RSS    AIC
<none>              5599.4   109.3
-x3      833.2  6432.6  109.8
-x1      5169.510768.9   119.1

>summary(lm.step)

Call:
lm(formula = y ~ x1 + x3, data= pho)

Residuals:
   Min     1Q Median     3Q    Max
-29.713-11.324  -2.953 11.286  48.679

Coefficients:
           Estimate Std. Error t valuePr(>|t|)  
(Intercept) 41.4794   13.8834   2.988 0.00920 **
x1           1.7374    0.4669   3.721 0.00205 **
x3           0.1548    0.1036   1.494 0.15592  
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 19.32on 15 degrees of freedom
Multiple R-squared:0.5481,    Adjusted R-squared: 0.4878
F-statistic: 9.095 on 2 and 15DF,  p-value: 0.002589
x3仍不够显著。
再用drop1函数做逐步回归。
>drop1(lm.step)
Single termdeletions

Model:
y ~ x1 + x3
      Df Sum ofSq    RSS    AIC
<none>              5599.4   109.3
x1        5169.510768.9   119.1
x3        833.2  6432.6  109.8
可以考虑再去掉x3.
>lm.opt<-lm(y~x1,data=pho);summary(lm.opt)

Call:
lm(formula = y ~ x1, data =pho)

Residuals:
   Min     1Q Median     3Q    Max
-31.486 -8.282  -1.674  5.623  59.337

Coefficients:
           Estimate Std. Error t valuePr(>|t|)   
(Intercept) 59.2590    7.4200   7.986 5.67e-07***
x1           1.8434    0.4789   3.849 0.00142 **
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 20.05on 16 degrees of freedom
Multiple R-squared:0.4808,    Adjusted R-squared: 0.4484
F-statistic: 14.82 on 1 and 16DF,  p-value: 0.001417
皆显著。

Ex6.3
>x<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,6,6,6,7,7,7,8,8,8,9,11,12,12,12)
>y<-c(0.6,1.6,0.5,1.2,2.0,1.3,2.5,2.2,2.4,1.2,3.5,4.1,5.1,5.7,3.4,9.7,8.6,4.0,5.5,10.5,17.5,13.4,4.5,30.4,12.4,13.4,26.2,7.4)
>plot(x,y)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)

Call:
lm(formula = y ~ 1 +x)

Residuals:
   Min     1Q Median     3Q    Max
-9.8413 -2.3369-0.0214  1.0592 17.8320

Coefficients:
           Estimate Std. Error t valuePr(>|t|)   
(Intercept) -1.4519    1.8353 -0.791   0.436   
           1.5578    0.2807   5.549 7.93e-06***
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 5.168on 26 degrees of freedom
Multiple R-squared:0.5422,    Adjusted R-squared: 0.5246
F-statistic: 30.8 on 1 and 26 DF,  p-value:7.931e-06
线性回归方程为 y=-1.4519+1.5578x,通过F检验。 常数项参数未通过t 检验。
>abline(lm.sol)
[转载]统计建模与R软件第六章习题答案(回归分析)
>y.yes<-resid(lm.sol)
>y.fit<-predict(lm.sol)
>y.rst<-rstandard(lm.sol)
>plot(y.yes~y.fit)
[转载]统计建模与R软件第六章习题答案(回归分析)

>plot(y.rst~y.fit)
[转载]统计建模与R软件第六章习题答案(回归分析)

残差并非是等方差的。
修正模型,对相应变量Y做开方。
>lm.new<-update(lm.sol,sqrt(.)~.)
>summary(lm.new)

Call:
lm(formula = sqrt(y) ~x)

Residuals:
    Min      1Q  Median      3Q     Max
-1.54255 -0.45280-0.01177  0.34925 2.12486

Coefficients:
           Estimate Std. Error t valuePr(>|t|)   
(Intercept) 0.76650   0.25592   2.995 0.00596 **
          0.29136   0.03914   7.444 6.64e-08***
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 0.7206on 26 degrees of freedom
Multiple R-squared:0.6806,    Adjusted R-squared: 0.6684
F-statistic: 55.41 on 1 and 26DF,  p-value: 6.645e-08
此时所有参数和方程均通过检验。
对新模型做标准化残差图,情况有所改善,不过还是存在一个离群值。第24和第28个值存在问题。
[转载]统计建模与R软件第六章习题答案(回归分析)

Ex6.4
>toothpaste<-data.frame( X1=c(-0.05,0.25,0.60,0,0.20, 0.15,-0.15, 0.15,0.10,0.40,0.45,0.35,0.30,0.50,0.50,0.40,-0.05,-0.05,-0.10,0.20,0.10,0.50,0.60,-0.05,0,0.05,0.55),X2=c(5.50,6.75,7.25,5.50,6.50,6.75,5.25,6.00,6.25,7.00,6.90,6.80,6.80,7.10,7.00,6.80,6.50,6.25,6.00,6.50,7.00,6.80,6.80,6.50,5.75,5.80,6.80),Y=c(7.38,8.51,9.52,7.50,8.28,8.75,7.10,8.00,8.15,9.10,8.86,8.90,8.87,9.26,9.00,8.75,7.95,7.65,7.27,8.00,8.50,8.75,9.21,8.27,7.67,7.93,9.26))
>lm.sol<-lm(Y~X1+X2,data=toothpaste);summary(lm.sol)

Call:
lm(formula = Y ~ X1 + X2, data= toothpaste)

Residuals:
    Min      1Q  Median      3Q     Max
-0.37130-0.10114  0.03066 0.10016  0.30162

Coefficients:
           Estimate Std. Error t valuePr(>|t|)   
(Intercept)  4.0759    0.6267   6.504 1.00e-06***
X1           1.5276    0.2354   6.489 1.04e-06***
X2           0.6138    0.1027   5.974 3.63e-06***
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 0.1767on 24 degrees of freedom
Multiple R-squared:0.9378,    Adjusted R-squared: 0.9327
F-statistic:  181 on 2 and 24 DF,  p-value:3.33e-15
回归诊断:
>influence.measures(lm.sol)
Influence measuresof
        lm(formula = Y ~ X1 + X2, data = toothpaste) :

    dfb.1_  dfb.X1  dfb.X2   dffitcov.r  cook.d    hatinf
 0.00908  0.00260 -0.00847  0.01211.366 5.11e-050.1681   
 0.06277  0.04467 -0.06785 -0.1244 1.159 5.32e-030.0537   
-0.02809  0.07724 0.02540  0.1858 1.283 1.19e-020.1386   
 0.11688  0.05055 -0.11078  0.14041.377 6.83e-03 0.1843  *
 0.01167  0.01887 -0.01766 -0.1037 1.141 3.69e-030.0384   
-0.43010-0.42881  0.45774  0.6061 0.8141.11e-010.0936   
 0.07840  0.01534 -0.07284  0.10821.481 4.07e-03 0.2364  *
 0.01577  0.00913 -0.01485  0.02081.237 1.50e-040.0823   
 0.01127 -0.02714 -0.00364  0.1071 1.156 3.95e-030.0466   
10 -0.07830 0.00171  0.08052  0.1890 1.1551.22e-020.0726   
11  0.00301-0.09652 -0.00365 -0.2281 1.127 1.76e-020.0735   
12 -0.03114 0.01848  0.03459  0.1542 1.1328.12e-030.0514   
13 -0.09236-0.03801  0.09940  0.2201 1.0711.62e-020.0522   
14 -0.02650 0.03434  0.02606  0.1179 1.2354.81e-030.0956   
15  0.00968-0.11445 -0.00857 -0.2545 1.150 2.19e-020.0910   
16 -0.00285-0.06185  0.00098 -0.1608 1.146 8.83e-030.0594   
17 0.07201  0.09744 -0.07796 -0.1099 1.364 4.19e-030.1731   
18 0.15132  0.30204 -0.17755 -0.3907 1.087 5.04e-020.1085   
19 0.07489  0.47472 -0.12980 -0.7579 0.731 1.66e-010.1092   
20 0.05249  0.08484 -0.07940 -0.4660 0.625 6.11e-020.0384   *
21 0.07557  0.07284 -0.07861 -0.0880 1.471 2.69e-030.2304   *
22 -0.17959-0.39016  0.18241 -0.5494 0.912 9.41e-020.1022   
23 0.06026  0.10607 -0.06207  0.12511.374 5.42e-030.1804   
24 -0.54830-0.74197  0.59358  0.8371 0.9142.13e-010.1731   
25 0.08541  0.01624 -0.07775  0.13141.249 5.97e-030.1069   
26 0.32556  0.11734 -0.30200  0.44801.018 6.49e-020.1033   
27 0.17243  0.32754 -0.17676  0.41271.148 5.66e-020.1369   
>source("Reg_Diag.R");Reg_Diag(lm.sol) #薛毅老师自己写的程序
     residual s1   standards2    student s3 hat_matrixs4     DFFITS s5
 0.00443843    0.02753865    0.02695925   0.16811819    0.01211949  
-0.09114255   -0.53021138   -0.52211469   0.05369239   -0.12436727  
 0.07726887    0.47112863    0.46335666   0.13857353    0.18584310  
 0.04805665    0.30111062    0.29532912   0.18427663    0.14036860  
-0.09130271   -0.52689847   -0.51881406   0.03838430   -0.10365442  
 0.30162101    1.79287913    1.88596579   0.09362223    0.60613406  
 0.03066005    0.19855842    0.19453763   0.23641540 0.10824626  
 0.01199519    0.07085860    0.06937393   0.08226537    0.02077047  
 0.08491891    0.49217591    0.48426323   0.04664158    0.10711246  
10 0.11625405    0.68315814    0.67537315   0.07261134    0.18897969  
11-0.13874451   -0.81570765   -0.80983786   0.07348894   -0.22807820  
12 0.11540228    0.67051940    0.66263761   0.05137589    0.15420864  
13 0.16178406    0.94041623    0.93806144   0.05219432    0.22013204  
14 0.06210727    0.36957277    0.36282531   0.09557411    0.11794546  
15-0.13650951   -0.81026658   -0.80428349   0.09101221   -0.25449541  
16-0.11097950   -0.64757782   -0.63955524   0.05943308   -0.16076716  
17-0.03939381   -0.24515626   -0.24029557   0.17309048   -0.10993940  
18-0.18593575   -1.11438446   -1.12029013   0.10845395   -0.39073410  
19-0.33609591   -2.01522068  * -2.16439284  *0.10922236   -0.75789180  *
20-0.37130271  * -2.14274943  *-2.33258738  *0.03838430   -0.46603012  
21-0.02545527   -0.16420856   -0.16084153   0.23042354  *-0.08801075  
22-0.26374306   -1.57517595   -1.62848498   0.10217431   -0.54936198  
23 0.04349338    0.27187605    0.26656251   0.18041800    0.12506702  
24 0.28060619    1.74627363    1.82969510   0.17309048    0.83711731  *
25 0.06459859    0.38683016    0.37987153   0.10691352    0.13143357  
26 0.21752520    1.29995371    1.31989945   0.10329116    0.44796770  
27 0.16987516    1.03474390    1.03633781   0.13685835    0.41266341  
  cooks_distance s6  COVRATIO s7
  5.108777e-05   1.3656752  
  5.316885e-03   1.1589547  
  1.190200e-02   1.2827036  
  6.827446e-03   1.3771332  
  3.693897e-03   1.1410104  
  1.106753e-01   0.8143179  
  4.068871e-03   1.4806452  *
  1.500251e-04   1.2372586  
  3.950358e-03   1.1560508  
10  1.218047e-02   1.1550557  
11  1.759216e-02   1.1271148  
12  8.116460e-03   1.1316638  
13  1.623390e-02   1.0710597  
14  4.811117e-03   1.2349272  
15  2.191171e-02   1.1501502  
16  8.832858e-03   1.1457602  
17  4.193532e-03   1.3637206  
18  5.035591e-02   1.0866343  
19  1.659840e-01   0.7313914  
20  6.109050e-02   0.6248838  
21  2.691197e-03   1.4714103  
22  9.412101e-02   0.9121786  
23  5.423856e-03   1.3735324  
24  2.127740e-01  *0.9139942  
25  5.971157e-03   1.2485557  
26  6.488529e-02   1.0178195  
27  5.658922e-02   1.1479080  
为什么两种方法检测结果不一样呢...不继续了

Ex6.5
>cement<-data.frame(X1=c( 7,  1, 11,11,  7, 11,  3, 1,  2, 21,  1, 11, 10),X2=c(26,29, 56, 31, 52, 55, 71, 31, 54, 47, 40, 66, 68),X3=c( 6,15,  8,  8, 6,  9, 17, 22, 18,  4,23,  9,  8),X4=c(60, 52, 20, 47,33, 22,  6, 44, 22, 26, 34, 12, 12),Y =c(78.5,74.3, 104.3,  87.6,  95.9, 109.2,102.7, 72.5, 93.1,115.9,  83.8, 113.3,109.4))
>xx<-cor(cement[1:4])
>kappa(xx,exact=T)
[1] 1376.881
大于1000,认为有严重的多重共线性。
>eigen(xx)
$values
[1] 2.235704035 1.5760660700.186606149 0.001623746

$vectors
          [,1]      [,2]      [,3]     [,4]
[1,]-0.4759552  0.5089794  0.67550020.2410522
[2,] -0.5638702 -0.4139315-0.3144204 0.6417561
[3,] 0.3940665 -0.6049691  0.63769110.2684661
[4,] 0.5479312  0.4512351 -0.19542100.6767340
即2.235704035X1+1.576066070X2+0.186606149X3+0.001623746X4=0
可以忽略X4项,可以看出X1,X2,X3存在共线性。
删去X3和X4后:
>xx<-cor(cement[1:2])
>kappa(xx,exact=T)
[1] 1.59262
共线性消失了。
如果去掉X3呢?
>cement1<-cbind(cement$X1,cement$X2,cement$X4)
>xx<-cor(cement1)
>kappa(xx,exact=T)
[1] 77.25113
如果去掉X4呢?
>xx<-cor(cement[1:3])
>kappa(xx,exact=T)
[1] 11.12112
看起来去掉X3和X4是合理的。

Ex6.6
>infection<-data.frame(x1<-c(0,1,0,0,0,1,1,1),x2<-c(0,0,1,0,1,0,1,1),x3<-c(0,0,0,1,1,1,0,1),success<-c(0,0,23,8,28,0,11,1),fail<-c(9,0,3,32,30,2,87,17))
>infection$Ymat<-cbind(infection$success,infection$fail)
>glm.sol<-glm(Ymat~x1+x2+x3,family=binomial,data=infection)
>summary(glm.sol)

Call:
glm(formula = Ymat ~ x1 + x2 +x3, family = binomial, data = infection)

DevianceResiduals:
                                                       
-2.56229  0.00000  1.49623  1.21563  -0.78520 -0.15231 -0.07162  0.26470 

Coefficients:
           Estimate Std. Error z valuePr(>|z|)   
(Intercept) -0.8207    0.4947  -1.659  0.0971 . 
x1          -3.2544    0.4813  -6.761 1.37e-11 ***
x2           2.0299    0.4553   4.459 8.25e-06***
x3          -1.0720    0.4254  -2.520  0.0117 * 
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

(Dispersion parameter forbinomial family taken to be 1)

   Null deviance: 83.491  on 6 degrees of freedom
Residual deviance:10.997  on 3  degrees offreedom
AIC: 36.178

Number of Fisher Scoringiterations: 5
回归模型为
P=exp(-0.8207-3.2544x1+2.0299x2-1.0720x3)/(1+exp(-0.8207-3.2544x1+2.0299x2-1.0720x3))

Ex6.7
(1)
>x<-c(rep(0:6,rep(2,7)))
>y<-c(508.1,498.4,568.2,577.3,651.7,657.0,713.4,697.5,755.3,758.9,787.6,792.1,841.4,831.8)
>lm.sol<-lm(y~1+x)
>summary(lm.sol)

Call:
lm(formula = y ~ 1 +x)

Residuals:
   Min     1Q Median     3Q    Max
-25.400-11.484  -3.779 14.629  24.921

Coefficients:
           Estimate Std. Error t valuePr(>|t|)   
(Intercept) 523.800     8.474   61.81 < 2e-16 ***
           54.893     2.350   23.36 2.26e-11***
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 17.59on 12 degrees of freedom
Multiple R-squared:0.9785,    Adjusted R-squared: 0.9767
F-statistic: 545.5 on 1 and 12DF,  p-value: 2.265e-11
线性回归模型为y=523.800+54.893x,通过t检验和F检验。
(2)
>lm.sol<-lm(y~1+x+I(x^2));summary(lm.sol)

Call:
lm(formula = y ~ 1 + x +I(x^2))

Residuals:
    Min      1Q  Median      3Q     Max
-10.6643 -5.6622 -0.4655  5.5000  10.6679

Coefficients:
           Estimate Std. Error t valuePr(>|t|)   
(Intercept)502.5560    4.8500 103.619  < 2e-16***
          80.3857    3.7861  21.232 2.81e-10 ***
I(x^2)      -4.2488    0.6063  -7.008 2.25e-05 ***
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

Residual standard error: 7.858on 11 degrees of freedom
Multiple R-squared:0.9961,    Adjusted R-squared: 0.9953
F-statistic: 1391 on 2 and 11 DF,  p-value:5.948e-14
多项式回归模型为:
y=502.5560+80.3857x-4.2488x^2,通过t检验和F检验。
(3)作散点图和拟合曲线:
>plot(x,y)
>xfit<-seq(0,6,0.01)
>yfit<-predict(lm.sol,data.frame(x=xfit))
>lines(xfit,yfit)
[转载]统计建模与R软件第六章习题答案(回归分析)

Ex6.8
读入数据:
>cancer<-read.table("data",header=T)
>cancer
   x1 x2x3 x4 x5 y
7064  11
6063  10
70 6511  1 0
40 6910  1 0
40 6358  1 0
7048  10
70 4811  1 0
8063  10
60 6314  1 0
10 30 53  1 0
11 80 43 12  1 0
12 40 55  1 0
13 60 66 25  1 1
14 40 67 23  1 0
15 20 61 19  1 0
16 50 63  1 0
17 50 66 16  1 0
18 40 68 12  1 0
19 80 41 12  1 1
20 70 53  1 1
21 60 37 13  1 0
22 90 54 12  0 1
23 50 52  0 1
24 70 50  0 1
25 20 65 21  0 0
26 80 52 28  0 1
27 60 70 13  0 0
28 50 40 13  0 0
29 70 36 22  0 0
30 40 44 36  0 0
31 30 54  0 0
32 30 59 87  0 0
33 40 69  0 0
34 60 50 22  0 0
35 80 62  0 0
36 70 68 15  0 0
37 30 39  0 0
38 60 49 11  0 0
39 80 64 10  0 1
40 70 67 18  0 1
>glm.sol<-glm(y~x1+x2+x3+x4+x5,family=binomial,data=cancer);summary(glm.sol)

Call:
glm(formula = y ~ x1 + x2 + x3+ x4 + x5, family = binomial,
   data = cancer)

DevianceResiduals:
    Min       1Q   Median       3Q      Max 
-1.71500 -0.66725 -0.22254  0.09936  2.23936 

Coefficients:
           Estimate Std. Error z valuePr(>|z|) 
(Intercept)-7.01140   4.47534 -1.567  0.1172 
x1          0.09994   0.04304  2.322   0.0202 *
x2          0.01415   0.04697  0.301  0.7631 
x3          0.01749   0.05458  0.320  0.7486 
x4         -1.08297   0.58721 -1.844   0.0651.
x5         -0.61309   0.96066 -0.638  0.5233 
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

(Dispersion parameter forbinomial family taken to be 1)

   Null deviance: 44.987  on 39 degrees of freedom
Residual deviance:28.392  on 34  degrees offreedom
AIC: 40.392

Number of Fisher Scoringiterations: 6
有的系数并不显著。
下面做逐步回归:
>glm.new<-step(glm.sol)
Start: AIC=40.39
y ~ x1 + x2 + x3 + x4 +x5

      Df Deviance   AIC
-x3     28.48438.484
-x2     28.48438.484
-x5     28.79938.799
<none>     28.392 40.392
-x4     32.64242.642
-x1     38.30648.306

Step: AIC=38.48
y ~ x1 + x2 + x4 +x5

      Df Deviance   AIC
-x2     28.56436.564
-x5     28.99336.993
<none>     28.484 38.484
-x4     32.70540.705
-x1     38.47846.478

Step: AIC=36.56
y ~ x1 + x4 +x5

      Df Deviance   AIC
-x5     29.07335.073
<none>     28.564 36.564
-x4     32.89238.892
-x1     38.47844.478

Step: AIC=35.07
y ~ x1 + x4

      Df Deviance   AIC
<none>     29.073 35.073
-x4     33.53537.535
-x1     39.13143.131
只留下x1和x4两个变量。
>summary(glm.new)

Call:
glm(formula = y ~ x1 + x4,family = binomial, data = cancer)

DevianceResiduals:
   Min      1Q  Median      3Q     Max 
-1.4825 -0.6617 -0.1877  0.1227  2.2844 

Coefficients:
           Estimate Std. Error z valuePr(>|z|) 
(Intercept)-6.13755   2.73844 -2.241   0.0250*
x1          0.09759   0.04079  2.393   0.0167 *
x4         -1.12524   0.60239 -1.868   0.0618.
---
Signif.codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 '' 1

(Dispersion parameter forbinomial family taken to be 1)

   Null deviance: 44.987  on 39 degrees of freedom
Residual deviance:29.073  on 37  degrees offreedom
AIC: 35.073

Number of Fisher Scoringiterations: 6
回归方程为
P=exp(-6.13755+0.09759x1-1.12524x4)/(1+exp(-6.13755+0.09759x1-1.12524x4))
概率估计略。

Ex6.9
我表示不想做了...
我没弄明白nls()函数里所说的start即初始值怎么设置。好像可以随便设置,只要保证函数收敛即可??









0 0