Coursera Machine Learning Note - Week 1

来源：互联网发布：mac的pages如何保存编辑：程序博客网时间：2024/05/16 02:28

Linear Regression with One Variable

Hypothesis function: hθ(x)=θ0+θ1x

Idea：Choose θ0,θ1so that hθ(x) is close to y for our training examples(x,y)

minimizeθ0,θ112m∑mi=1(hθ(x(i))−y(i))2

Parameters：θ0,θ1

Cost function：J(θ0,θ1)=12m∑mi=1(hθ(x(i))−y(i))2, where m is the training size

So, the goal: minimizeθ0,θ1J(θ0,θ1)

Note that： J(θ0,θ1)=12m∑mi=1(hθ(x(i))−y(i))2=12m∑mi=1((θ0+θ1x(i))2+(y(i))2−2(θ0+θ1x(i))y(i))

It’s a function of the parameters θ0,θ1, its graph like the following:

这里写图片描述

Gradient descent algorithm：

Start with some θ0,θ1
Keep changing θ0,θ1to reduce J(θ0,θ1) until we hopefully end up at a minimum

repeat until convergence{

θ0:=θ0−α∂∂θ0J(θ0,θ1)=θ0−α1m∑mi=1(hθ(x(i))−y(i))
θ1:=θ1−α∂∂θ1J(θ0,θ1)=θ1−α1m∑mi=1(hθ(x(i))−y(i))⋅x(i)
(update θ0,θ1 simultaneously)
}
where α is learning rate, usually set to 0.03

0 0