重学Statistics, Cha14 Simple Linear Regression
来源:互联网 发布:数据挖掘毕业论文题目 编辑:程序博客网 时间:2024/04/30 04:06
14.1 Simple Linear Regression Model
Simple Linear Regression Model: y = β0 + β1 x + ε
- β0 β1 are referred to as parameters of the model
- ε is a random variable referred to as the error term, which is the variability i y that cannot be explained by the linear relationship between x and y.
Simple Linear Regression Equation : E(y) = β0 + β1 x
Estimated Simple Linear Regression Equation: yˆ = b0 + b1x
14.2 Least Square Method
It is a procedure for using sample data to find the estimated regression equation.
14.3 Coefficient of Determination
怎么证明,刚刚的模型能fit the data?
SSE: SUM of squares due to error
SST = SSR + SSE
Coefficient of determination
Correlation Coefficient
Correlation Coefficient 仅能用在 a linear relationship between two variables
Coefficient of determination 可以用在 nonlinear relationship and for relationships that have two or more independent variables
14.4 Model Assumption
An important step in determining whether the assumed model is appropriate involves testing for the significance of the relationship.
14.5 Test For Significance
Estimate of σ2
t-Test
H0: β1 = 0
H1: β1 != 0
Confidence Interval for β1
b1 +- t * sb1 = 5 +- 1.95
因为 interval 都比0大,所以可以 reject H0
F Test
F test has the same result as t test if there is only one independent variable, with more than one independent variable, only F test can be used to test for an overall significant relationship.
问题:为什么说当 β1 =0时,MSR/MSE接近于1?而且满足 F distribution?
Cautions:
Reject H0 does not enable us to conclude that the relationship between x and y is linear. We can state only that x and y are related and that a linear relationship explains a significant portion of the variability in y over the range of values for x observed in the sample.
14.6 Using estimated regression equation for estimation and prediction
Point estimation: 直接代入公式算出 y
Confidence Interval :
Prediction Interval :
14.8 Residual Analysis: Validating Model Assumptions
4个假设:
1. E(ε) = 0.
2. The variance of ε, denoted by σ2,is the same for all values of x.
3. The values of ε are independent.
4. The error term ε has a normal distribution.
Residual Plot Against x
Standardized Residuals
正好从下图中,可以看出 approximately 95% of the standardized residuals between -2 and +2
Normal Probability Plot
用 normal scores 和 standardized residuals plot一个图,看出是过原点的45度的直线。
14.9 Residual Analysis: Outliers and Influential Observations
Detecting Outliers
Identify any observation with a standardized residual of less than -2 or greater than +2 as an unusual observation.
Detecting Influential Observations
- 重学Statistics, Cha14 Simple Linear Regression
- 重学 Statistics, Cha16 General Linear Model
- 重学 Statistics, Cha15 Multiple Regression
- simple linear regression详解
- Probability And Statistics In Python: Linear Regression
- 重学Statistics,Cha1 Data and Statistics
- Simple tutorial for using TensorFlow to compute a linear regression
- 算法导论 Algorithms 01 - 线性回归 Simple linear regression
- R tutorial 12 - Simple linear regression 线性回归 (1)
- R tutorial 15 - Simple linear regression 线性回归 (2)
- 7.1 简单线性回归(Simple Linear Regression)
- 简单线性回归(Simple Linear Regression)下
- 简单线性回归(Simple Linear Regression)问题和举例
- R语言--线性回归分析(1)--simple linear regression
- 重学Statistics, Cha2 Descriptive Statistics (Categorical and Quantitative Data)
- 重学statistics,Cha3 Descriptive Statistics: numerical measures
- 重学Statistics, Cha4 Introduction to Probability
- 重学statistics, Cha6 Continuous Probability Distributions
- MyEclipse10破解问题
- jenkins自动化部署
- Padding +Margin+gravity
- [LeetCode] 64. Minimum Path Sum
- ios developer tiny share-20160727
- 重学Statistics, Cha14 Simple Linear Regression
- 什么是进程,什么是线程
- 服务器端架构及实战 — C#分享
- 将java工程转为web工程(Myeclipse无法add web capabilities的时候)
- 在CentOS 7中安装与配置JDK8
- Java HashMap工作原理及实现
- 走遍美国 —— 大学与教育
- 从一个问题的延伸:strcpy函数的实现
- 在CentOS 7中安装与配置Tomcat-8方法