linear regression(3)-Gradient Descent in Practice I/II(Feature Scalling/Learning Rate)

来源:互联网 发布:电脑怎么清理软件 编辑:程序博客网 时间:2024/05/22 04:52

Gradient Descent in Practice I - Feature Scaling

goal:speed up gradient descent by having each of our input values in roughly the same range


Where μi is the average of all the values for feature (i) and si is the range of values (max - min), or si is the standard deviation.

Gradient Descent in Practice II - Learning Rate

goal:find the fit learning rate to make the J(θ) will decrease on every iteration.


If α is too small: slow convergence.

If α is too large: may not decrease onevery iteration and thus may not converge.

Polynomial Regression

goal:simplify our hypothesis function

combine multiple features into one

For example, if our hypothesis function is hθ(x)=θ0+θ1x1

then we can create additional features based on x1, to get the quadratic function hθ(x)=θ0+θ1x1+θ2x12

or the cubic function hθ(x)=θ0+θ1x1+θ2x12+θ3x13

In the cubic version, we have created new features x2 and x3 where x2=x12 and x3=x13.

To make it a square root function, we could do: hθ(x)=θ0+θ1x1+θ2x1

One important thing to keep in mind is, if you choose your features this way thenfeature scaling becomes very important.

0 0