#“Machine Learning”（Andrew Ng）#Week 2_2：Octave/Matlab Tutorial

来源：互联网发布：三星自主软件编辑：程序博客网时间：2024/04/25 12:49

1、Basic Operations

基本的数学运算／逻辑运算

1==2

1 ~= 2

1 && 0

XOR ( 1, 0 )

简化窗口命令：PS('>> ');

如果你想分配一个变量，但不希望在屏幕上显示结果，你可以在命令后加一个分号，可以抑制打印输出敲入回车后不打印任何东西。

对于更复杂的屏幕输出，也可以用DISP命令显示，Disp( A )就相当于像这样打印出A。

ones(2, 3)：用来生成矩阵，其结果为一个两行三列的矩阵，不过矩阵中的所有元素都为1。

rand(3, 3)：生成了一个 3×3的矩阵，并且其所有元素均为随机数值介于0和1之间。

eye(4)：一个4阶单位矩阵。 eye( 3 )是一个3阶方阵。

2、Moving Data Around

如何在 Octave 中移动数据，具体来说，如果你有一个机器学习问题，你怎样把数据加载到 Octave 中？怎样把数据存入一个矩阵？如何对矩阵进行相乘？如何保存计算结果？如何移动这些数据，并用数据进行操作？

size() ：返回矩阵的尺寸。实际上 size(A) 命令返回的，是一个 1×2 的矩阵，分别显示矩阵A的行数和列数。

size(A, 1)：这个命令会返回 A 矩阵的第一个元素 A 矩阵的第一个维度的尺寸，也就是 A 矩阵的行数。

size(A, 2)： A 矩阵的列数，也就是 A 矩阵的列数。

length(v) ：返回最大维度的大小。键入 length(A) 由于矩阵 A 是一个3×2的矩阵，因此最大的维度，应该是3因此该命令会返回3。

pwd：可以显示出 Octave 当前所处路径。

who：显示出当前 Octave 储存的变量。

whos：能更详细地进行查看。

clear：删除工作空间中的所有变量。

save：存储命令。这个命令把数据按照二进制形式储存或者说是更压缩的二进制形式。如果你想把数据存成一个人能看懂的形式，那么可以键入 save hello.txt v -ascii 这样就会把数据，存成一个文本文档或者将数据的 ascii 码存成文本文档。

（索引操作）

A(3,2)：这将索引到 A 矩阵的 (3,2) 元素。 A(2,:) 表示 A 矩阵的第二行的所有元素。 A(:,2) 这将返回 A 矩阵第二列的所有元素。

怎样把矩阵放到一起，怎样取出矩阵并且把它们放到一起组成更大的矩阵？

如何在系统中加载数据和寻找数据？？？

事实上，我的桌面上，有两个文件 featuresX.dat 和 priceY.dat 是两个我想解决的机器学习问题。

那么应该怎样把数据读入 Octave 呢？

我们只需要键，键入 featuresX.dat 这样，我将加载了 featuresX 文件，同样地我可以加载 priceY.dat。 load('featureX.dat') 也是可以的。

（此处前提是，这两个文件位于Octave的当前路径下。当然也可以通过主动的将文件所在路径添加到Octave路径搜索表中）

我们设变量 v 为 priceY(1:10)，这表示的是将向量 Y 的前10个元素存入 v 中，我们输入 who 或者 whos Y 是一个47×1的向量，因此现在 v 就是10×1的向量，因为刚才设置了 v = priceY(1:10) 这便将 v 的值，设为了 Y 的前十个元素，假如我们想把它存入硬盘，就用 save hello.mat v 命令，会将变量 v 存成一个叫 hello.mat 的文件。

3、Computing on Data

如何对数据进行运算？

A .*B ：点乘。

在Octave中点号一般用来表示元素位运算。

1 ./ A 得到 A中每一个元素的倒数。

abs 来对 v 的每一个元素求绝对值。

A’ 将得出 A 的转置矩阵。

magic 函数：将返回一个矩阵称为魔方阵或幻方 (magic squares)。它们所有的行和列和对角线，加起来都等于相同的值。

4、Plotting Data

当开发学习算法时，往往几个简单的图，可以让你更好地理解算法的内容，并且可以完整地检查下算法是否正常运行，是否达到了算法的目的，例如在之前的视频中，我谈到了绘制成本函数J(θ)，可以帮助确认梯度下降算法是否收敛，通常情况下，绘制数据或学习算法所有输出也会启发你，如何改进你的学习算法，幸运的是 Octave有非常，简单的工具用来生成大量不同的图，当我用学习算法时，我发现绘制数据绘制学习算法等往往是我获得想法来改进算法的重要部分。

subplot(1,2,1) 它将图像分为一个 1*2的格子，也就是前两个参数，然后它使用第一个格子，也就是最后一个参数1的意思。

5、Control Statements: for,while,if statement

介绍有关向量化的内容，无论你是用Octave 还是别的语言，比如MATLAB 或者你正在用Python NumPy 或 Java C C++ 所有这些语言都具有各种线性代数库，这些库文件都是内置的，容易阅读和获取，他们通常写得很好，已经经过高度优化，通常是数值计算方面的博士或者专业人士开发的，而当你实现机器学习算法时，如果你能好好利用这些线性代数库或者说数值线性代数库，并联合调用它们而不是自己去做那些函数库可以做的事情，如果是这样的话那么，通常你会发现，首先，这样更有效也就是说运行速度更快。并且更好地利用你的计算机里可能有的一些并行硬件系统等等，第二，这也意味着，你可以用更少的代码来实现你需要的功能，因此，实现的方式更简单，代码出现问题的有可能性也就越小。

6、向量化的好处

这是一个常见的线性回归假设函数，如果你想要计算 h(x) 注意到右边是求和，那么你可以自己计算 j =0 到 j = n 的和，但换另一种方式来想想是把 h(x) 看作 θ 转置乘以 x，那么，你就可以写成两个向量的内积，其中 θ 就是 θ0 θ1 θ2 如果你有两个特征量，如果 n 等于2 并且如果你把 x 看作 x0 x1 x2。
这两种思考角度，会给你两种不同的实现方式比如说这是未向量化的代码实现方式计算 h(x) 是未向量化的，变量 prediction 的最终结果就是 j 取值 0 到 n+1 变量prediction 每次就通过自身加上 theta(j) 乘以 x(j) 更新值，这个就是算法的代码实现。

向量化的代码实现，你把 x 和 θ 看做向量，而你只需要令变量 prediction 等于 theta 转置，乘以 x 你就可以这样计算。与其写所有这些 for 循环的代码，你只需要一行代码，右边所做的，就是利用 Octave 的高度优化的数值，线性代数算法来计算两个向量的内积 θ 以及 x ，这样向量化的实现不仅仅是更简单，它运行起来也将更加高效，这就是 Octave 所做的而向量化的方法，在其他编程语言中同样可以实现。

Let's now look at a more sophisticated example. Just to remind you, here's our update rule for a gradient descent of a linear regression. And so we update theta j using this rule for all values of j = 0, 1, 2, and so on. And if I just write out these equations for theta 0, theta 1, theta 2, assuming we have two features, so n = 2. Then these are the updates we perform for theta 0, theta 1, theta 2, where you might remember my saying in an earlier video, that these should be simultaneous updates. So, let's see if we can come up with a vectorizing notation of this.

Here are my same three equations written in a slightly smaller font, and you can imagine that one way to implement these three lines of code is to have a for loop that says for j = 0, 1 through 2 to update theta j, or something like that. But instead, let's come up with a vectorized implementation and see if we can have a simpler way to basically compress these three lines of code or a for loop that effectively does these three steps one set at a time. Let's see if we can take these three steps and compress them into one line of vectorized code. Here's the idea.

What I'm going to do is, I'm going to think of theta as a vector, and I'm gonna update theta as theta- alpha times some other vector delta, where delta's is going to be equal to 1 over m, sum from i = 1 through m. And then this term over on the right, okay? So, let me explain what's going on here. Here, I'm going to treat theta as a vector, so this is n plus one dimensional vector, and I'm saying that theta gets here updated as that's a vector, Rn + 1. Alpha is a real number, anddelta, here is a vector. So, this subtraction operation, that's a vector subtraction, okay? Cuz alpha times delta is a vector, and so I'm saying theta gets this vector, alpha times delta subtracted from it. So, what is a vector delta? Well this vector delta, looks like this, and what it's meant to be is really meant to be this thing over here. Concretely, delta will be an plus one dimensional vector, and the very first element of the vector delta is going to be equal to that. So, if we have the delta, if we index it from 0, if it's delta 0, delta 1, delta 2, what I want is that delta 0 is equal to this first box in green up above.

And indeed, you might be able to convince yourself that delta 0 is this 1 of the m sum of ho(x), x(i) minus y(i) times x(i) 0. So, let's just make sure we're on this same page about how delta really is computed. Delta is 1 over m times this sum over here, and what is this sum? Well, this term over here, that's a real number, and the second term over here, x i, this term over there is a vector, right, because x(i) may be a vector that would be, say, x(i)0, x(i)1, x(i)2, right, and what is the summation? Well, what the summation is saying is that, this term, that is this term over here, this is equal to, (h of(x(1))- y(1)) * x(1) + (h of(x(2))- y(2) x x(2) +, and so on, okay? Because this is summation of i, so as i ranges from i = 1 through m, you get these different terms, and you're summing up these terms here. And the meaning of these terms, this is a lot like if you remember actually from the earlier quiz in this, right, you saw this equation. We said that in order to vectorize this code we will instead said u = 2v + 5w.

So we're saying that the vector u is equal to two times the vector v plus five times the vector w. So this is an example of how to add different vectors and this summation's the same thing. This is saying that the summation over here is just some real number, right? That's kinda like the number two or some other number times the vector, x1. So it's kinda like 2v or say some other number times x1, and then plus instead of 5w we instead have some other real number, plus some other vector, and then you add on other vectors, plus dot, dot, dot, plus the other vectors, which is why, over all, this thing over here, that whole quantity, that delta is just some vector. And concretely, the three elements of delta correspond if n = 2, the three elements of delta correspond exactly to this thing, to the second thing, and this third thing. Which is why when you update theta according to theta- alpha delta, we end up carrying exactly the same simultaneous updates as the update rules that we have up top.

So, I know that there was a lot that happened on this slide, but again, feel free to pause the video and if you aren't sure what just happened I'd encourage you to step through this slide to make sure you understand why is it that this update here with this definition of delta, right, why is it that that's equal to this update on top? And if it's still not clear, one insight is that, this thing over here, that's exactly the vector x, and so we're just taking all three of these computations, and compressing them into one step with this vector delta, which is why we can come up with a vectorized implementation of this step of the new refresh in this way. So, I hope this step makes sense and do look at the video and see if you can understand it. In case you don't understand quite the equivalence of this map, if you implement this, this turns out to be the right answer anyway. So, even if you didn't quite understand equivalence, if you just implement it this way, you'll be able to get linear regression to work. But if you're able to figure out why these two steps are equivalent, then hopefully that will give you a better understanding of vectorization as well. And finally, if you are implementing linear regression using more than one or two features, so sometimes we use linear regression with 10's or 100's or 1,000's of features. But if you use the vectorized implementation of linear regression, you'll see that will run much faster than if you had, say, your old for loop that was updating theta zero, then theta one, then theta two yourself. So, using a vectorized implementation, you should be able to get a much more efficient implementation of linear regression. And when you vectorize later algorithms that we'll see in this class, there's good trick, whether in Octave or some other language like C++, Java, for getting your code torun more efficiently.

0 0