Machine Learning--Andrew Ng--week 1

来源：互联网发布：伪装者好看吗知乎编辑：程序博客网时间：2024/04/28 01:59

用了三个半天听完三组lecture内容，觉得只是一个introduction,而且试水不用任何字幕辅助，但是到做题发现，直接计算的，对中国学生来说就是高中甚至初中的数学；概念题，而且是多选概念题，简直就是地狱，听懂和明白意思差距很大，去百度搜国内大神做的笔记，才明白是怎么回事，但是被挂三次，都是3/5，不过。被虐到去谷歌那找答案，第五遍给我刷过去了，orz

Let f be some function so that

f(θ0,θ1) outputsa number. For this problem,

f is somearbitrary/unknown smooth function (not necessarily the

cost function of linear regression,so f may have local optima).

Suppose we use gradient descent totry to minimize f(θ0,θ1)

as a function of θ0 and θ1. Which ofthe

followingstatements are true? (Check all that apply.)

错误的

错的Setting the learning rate α to be very small is not harmful, and can

only speed up the convergence of gradient descent.

阿尔法太小，有害

9月20号第一次

9月20号第二次：

错的，不能选Gradient descent is likely to get stuck at a local minimumand fail to find the global minimum.

R：对于这题的线性回归问题，除去全局最优它不存在局部最优，所以不可能卡在某一个局部最优位置。

错的，不能选For this to be true, we musthave θ0=0 and θ1=0

so that hθ(x)=0

大神答案地址：

https://raw.githubusercontent.com/DragonflyStats/Coursera-ML/master/Week-02/MLQuizWeek2.tex

0.5

NO No matter how $\theta_0$ and $\theta_1$are initialized, so long as $\alpha$ is sufficiently small, we

can safely expect gradient descent toconverge to the same solution.

Correct 0.25

This is not true, because depending on theinitial condition, gradient descent may end up at different local optima.

YES If the learning rate is too small, thengradient descent may take a very long time to converge.

Correct 0.25

Ifthe learning rate is small, gradient descent ends up taking an extremely smallstep on each iteration, and therefore can take a long time to converge.

YES If $\theta_0$ and $\theta_1$ areinitialized at the global minimum, the one iteration will not change theirvalues.

Correct 0.25

Atthe global minimum, the derivative (gradient) is zero, so gradient descent willnot change the parameters.

NO Setting the learning rate $\alpha$ to bevery small is not harmful, and can only speed up the convergence of gradientdescent.

Correct 0.25

Ifthe learning rate is small, gradient descent ends up taking an extremely smallstep on each iteration, so this would actually slow down (rather than speed up)the convergence of the algorithm.

If θ0 and θ1 are initialized at a localminimum, the one iteration will not

change their values. Inorrect 0.00

At a local minimum, the derivative (gradient)is zero, so gradient descent

will not change the parameters.

YES If the first few iterations of gradientdescent cause f(θ0,θ1) to increase rather than decrease, then the most likelycause is that we have set the learning rate α to too large a value. Inorrect 0.00 If alpha were small enough, then gradientdescent should always successfully take a tiny small downhill and decreasef(\theta_0,\theta_1) at least a little bit. If gradient descent insteadincreases the objective value, that means alpha is too large (or you have a bugin your code!).

YES If θ0 and θ1 are initialized at theglobal minimum, the one iteration will not change their values. Correct 0.25 At the global minimum, the derivative(gradient) is zero, so gradient descent will not change the parameters.

YES No matter how θ0 and θ1 are initialized,so long as α is sufficiently small, we can safely expect gradient descent toconverge to the same solution. Correct 0.25 Thisis not true, because depending on the initial condition, gradient descent mayend up at different local optima.

NO Even if the learning rate α is very large,every iteration of gradient descent will decrease the value of f(θ0,θ1). Inorrect 0.00 If the learning rate α is too large, onestep of gradient descent can actually vastly "overshoot", and actuallincrease the value of f(θ0,θ1).

YES If θ0 and θ1 are initialized so thatθ0=θ1, then by symmetry (because we do simultaneous updates to the twoparameters), after one iteration of gradient descent, we will still have θ0=θ1. Inorrect 0.00 The updates to θ0 and θ1 are different (eventhough we're doing simultaneous updates), so there's no particular reason toexpect them to be the same after one iteration of gradient descent.

%-------------------------------------------------------------------------------------%

Suppose that for some linear regressionproblem (say, predicting housing prices as in the lecture), we have sometraining set, and for our training set we

managed to find some $\theta_0$, $\theta_1$such that $J(\theta_0, \theta_1)$=0. Which of the statements below must then betrue? (Check all that apply.)

Your Answer Score Explanation

NO We can perfectly predict the value of yeven for new examples that we have not yet seen. (e.g., we can perfectlypredict prices of even new houses that we have not yet seen.) Inorrect 0.00 Even though we can fit our training setperfectly, this does not mean that we'll always make perfect predictions onhouses in the future/on houses that we have not yet seen.

NO This is not possible: By the definition ofJ(θ0,θ1), it is not possible for there to exist θ0 and θ1 so that J(θ0,θ1)=0 Correct 0.25 If all of our training examples lieperfectly on a line, then J(θ0,θ1)=0 is possible.

YES Our training set can be fit perfectly bya straight line, i.e., all of our training examples lie perfectly on some straightline. Inorrect 0.00 If J(θ0,θ1)=0,that means the line defined by the equation "y=θ0+θ1x" perfectly fitsall of our data.

NO Gradient descent is likely to get stuck ata local minimum and fail to find the global minimum. Inorrect 0.00 The cost function J(θ0,θ1) for linearregression has no local optima (other than the global minimum), so gradientdescent will not get stuck at a bad local minimum.

NO For this to be true, we must have y(i)=0for every value of i=1,2,…,m. Correct 0.25 Solong as all of our training examples lie on a straight line, we will be able tofind θ0 and θ1 so that J(θ0,θ1)=0. It is not necessary that y(i)=0 for all ofour examples.

NO For this to be true, we must have$\theta_0$=0 and $\theta_1$=0 so that $h_\theta$(x)=0 Correct 0.25

If $J(\theta_0, \theta_1)$=0, that means theline defined by the equation "y=$\theta_0$+$\theta_1$x" perfectlyfits all of our data.

There's no particular reason to expect thatthe values of $\theta_0$ and $\theta_1$ that achieve this are both 0 (unlessy(i)=0 for all of our training examples).

NO This is not possible: By the definition of$J(\theta_0, \theta_1)$, it is not possible for there to exist $\theta_0$ and$\theta_1$ so that $J(\theta_0, \theta_1)$=0

Correct 0.25 If all of our training examples lieperfectly on a line, then $J(\theta_0, \theta_1)$=0 is possible.

YES Gradient descent is likely to get stuckat a local minimum and fail to find the global minimum.

Incorrect 0.00 The cost function $J(\theta_0, \theta_1)$for linear regression has no local optima (other than the global minimum),

sogradient descent will not get stuck at a bad local minimum.

NO Our training set can be fit perfectly by astraight line, i.e., all of our training examples lie perfectly on some straightline.

Incorrect 0.00

If$J(\theta_0, \theta_1)$=0, that means the line defined by the equation"y=$\theta_0$+$\theta_1$x" perfectly fits all

ofour data.

YES For these values of θ0 and θ1 thatsatisfy J(θ0,θ1)=0, we have that hθ(x(i))=y(i) for every training example(x(i),y(i)) Inorrect 0.00 J(θ0,θ1)=0,that means the line defined by the equation "y=θ0+θ1x" perfectly fitsall of our data.

0 0