重学Statistics, Cha14 Simple Linear Regression

来源:互联网 发布:数据挖掘毕业论文题目 编辑:程序博客网 时间:2024/04/30 04:06

14.1 Simple Linear Regression Model

Simple Linear Regression Model: y = β0 + β1 x + ε

  • β0 β1 are referred to as parameters of the model
  • ε is a random variable referred to as the error term, which is the variability i y that cannot be explained by the linear relationship between x and y.

Simple Linear Regression Equation : E(y) = β0 + β1 x
Estimated Simple Linear Regression Equation: yˆ = b0 + b1x
这里写图片描述

14.2 Least Square Method

It is a procedure for using sample data to find the estimated regression equation.
这里写图片描述

这里写图片描述

这里写图片描述

这里写图片描述

14.3 Coefficient of Determination

怎么证明,刚刚的模型能fit the data?

SSE: SUM of squares due to error
这里写图片描述
这里写图片描述
这里写图片描述

这里写图片描述

SST = SSR + SSE

Coefficient of determination
这里写图片描述
Correlation Coefficient
这里写图片描述

Correlation Coefficient 仅能用在 a linear relationship between two variables
Coefficient of determination 可以用在 nonlinear relationship and for relationships that have two or more independent variables

14.4 Model Assumption

An important step in determining whether the assumed model is appropriate involves testing for the significance of the relationship.
这里写图片描述

这里写图片描述

14.5 Test For Significance

Estimate of σ2

这里写图片描述
这里写图片描述
这里写图片描述


t-Test

H0: β1 = 0
H1: β1 != 0
这里写图片描述

这里写图片描述

这里写图片描述

Confidence Interval for β1

b1 +- t * sb1 = 5 +- 1.95
因为 interval 都比0大,所以可以 reject H0

F Test

F test has the same result as t test if there is only one independent variable, with more than one independent variable, only F test can be used to test for an overall significant relationship.

问题:为什么说当 β1 =0时,MSR/MSE接近于1?而且满足 F distribution?

这里写图片描述
这里写图片描述
这里写图片描述

Cautions:

Reject H0 does not enable us to conclude that the relationship between x and y is linear. We can state only that x and y are related and that a linear relationship explains a significant portion of the variability in y over the range of values for x observed in the sample.

14.6 Using estimated regression equation for estimation and prediction

Point estimation: 直接代入公式算出 y

Confidence Interval :

这里写图片描述

这里写图片描述

Prediction Interval :

这里写图片描述

这里写图片描述

这里写图片描述

14.8 Residual Analysis: Validating Model Assumptions

4个假设:
1. E(ε) = 0.
2. The variance of ε, denoted by σ2,is the same for all values of x.
3. The values of ε are independent.
4. The error term ε has a normal distribution.

Residual Plot Against x

这里写图片描述

Standardized Residuals

这里写图片描述

这里写图片描述

正好从下图中,可以看出 approximately 95% of the standardized residuals between -2 and +2
这里写图片描述

Normal Probability Plot

用 normal scores 和 standardized residuals plot一个图,看出是过原点的45度的直线。
这里写图片描述
这里写图片描述

14.9 Residual Analysis: Outliers and Influential Observations

Detecting Outliers

Identify any observation with a standardized residual of less than -2 or greater than +2 as an unusual observation.

Detecting Influential Observations

这里写图片描述

这里写图片描述

这里写图片描述

0 0