机器学习笔记之简化成本函数和梯度下降

来源：互联网发布：引用另外一个表格数据编辑：程序博客网时间：2024/05/16 07:17

Simplified Cost Function and Gradient Descent

Note: [6:53 - the gradient descent equation should have a 1/m factor]

We can compress our cost function's two conditional cases into one case:

Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))

Notice that when y is equal to 1, then the second term (1−y)log(1−hθ(x)) will be zero and will not affect the result. If y is equal to 0, then the first term −ylog(hθ(x)) will be zero and will not affect the result.

We can fully write out our entire cost function as follows:

J(θ)=−1m∑i=1m[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]

A vectorized implementation is:

h=g(Xθ)J(θ)=1m⋅(−yTlog(h)−(1−y)Tlog(1−h))

Gradient Descent

Remember that the general form of gradient descent is:

Repeat{θj:=θj−α∂∂θjJ(θ)}

We can work out the derivative part using calculus to get:

Repeat{θj:=θj−αm∑i=1m(hθ(x(i))−y(i))x(i)j}

Notice that this algorithm is identical to the one we used in linear regression. We still have to simultaneously update all values in theta.

A vectorized implementation is:

θ:=θ−αmXT(g(Xθ)−y⃗ )

阅读全文

0 0