linear regression(3)-Gradient Descent in Practice I/II(Feature Scalling/Learning Rate)

来源：互联网发布：电脑怎么清理软件编辑：程序博客网时间：2024/05/22 04:52

Gradient Descent in Practice I - Feature Scaling

goal:speed up gradient descent by having each of our input values in roughly the same range

$xi:=(xi-μi)/si$

Where $μi$ is the average of all the values for feature (i) and $si$ is the range of values (max - min), or $si$ is the standard deviation.

goal:find the fit learning rate to make the J(θ) will decrease on every iteration.

summary:

If $α$ is too small: slow convergence.

If $α$ is too large: may not decrease onevery iteration and thus may not converge.

goal:simplify our hypothesis function

combine multiple features into one

For example, if our hypothesis function is $hθ(x)=θ0+θ1x1$

then we can create additional features based on $x1$ , to get the quadratic function $hθ(x)=θ0+θ1x1+θ2x12$

or the cubic function $hθ(x)=θ0+θ1x1+θ2x12+θ3x13$

In the cubic version, we have created new features $x2$ and $x3$ where $x2=x12$ and $x3=x13$ .

To make it a square root function, we could do: $hθ(x)=θ0+θ1x1+θ2x1$

One important thing to keep in mind is, if you choose your features this way thenfeature scaling becomes very important.

0 0