机器学习MatLab实战整理--线性回归

来源：互联网发布：淘宝网安全背心编辑：程序博客网时间：2024/05/17 08:12

1.最小均方算法
单个变量的情况下，cost function为：hθ(x) = θTx = θ0 + θ1x1。

data = load('ex1data1.txt');X = data(:, 1); y = data(:, 2);

首先将ex1data1.txt里的样本数据读取到data变量里，将data的里的第一列数据赋值给X,第二列数据赋值给y.类似的用法见下：
data 是一个矩阵
data（x，y） x是行数 y是列数
data（：，y）就是指y列对应的所有行的值组成的一个向量
data(:,[y1:y2]) 就是指y1到y2列的对应的所有行的值组成的一个矩阵
Y=data(1:2:N,:),取data的第一行，第三行，第五行……第2k+1行的所有数据.范不能超过n

figure; % open a new figure windowplot(x,y,'rx','MarkerSize',10); %plot the dataylabel('Profit in $10,000s');   %set the y-axis labelxlabel('Population of City in 10,000s');%set the x-axis label

绘制出来的分布图就是下面这样：

这里写图片描述

X = [ones(m, 1), data(:,1)]; % Add a column of ones to xtheta = zeros(2, 1); % initialize fitting parameters

因为hθ(x) = θTx = θ0 + θ1x1，所以theta初始化为一个2行1列的全为0的数组，那么要和theta相乘产生结局项，就要为X前添加m行一列的（m为样本的长度）全为1的矩阵。
这里写图片描述

iterations = 1500;alpha = 0.01;

设置迭代次数和迭代速度。
theta的迭代公式如下：
这里写图片描述
具体表示的话就是：

    temp1 = theta(1) - (alpha / m) * sum((X * theta - y).* X(:,1));      temp2 = theta(2) - (alpha / m) * sum((X * theta - y).* X(:,2));      theta(1) = temp1;      theta(2) = temp2;      J_history(iter) = computeCost(X, y, theta);

得到theta的最终结果：
Theta found by gradient descent: -3.630291 1.166362 。
当然迭代出来的theta是和它的处置有关的，改变一下theta的初值，就会得到不一样的结果：
Theta found by gradient descent: -3.570819 1.160388 。
那么此时函数hθ(x) = θTx = θ0 + θ1x1中，theta就是已知的了，再用样本数据中的X,重新计算得到一条线性的图像：

plot(X(:,2), X*theta, '-')

这里写图片描述
现在就可以得到想要拟合的值了：

predict1 = [1, 3.5] *theta;

For population = 35,000, we predict a profit of 4905.377182。

2.正规方程法
正规方程法基本同上面一样，只是把不用再迭代了，直接改为：

theta = inv(X' * X) * X' * y;

除此之外，aipha的不同对实验结果也有影响，过小过大都有其弊端，过小导致迭代速度过慢，过大则可能造成结果偏离目标值。
不同的aipha，得到的cost functuon图像走势如下：
aipha=0.1
这里写图片描述
aipha=0.5

aipha=2

1 0