机器学习(Machine Learning)心得体会(1)线性回归

来源：互联网发布：甲子网络域名编辑：程序博客网时间：2024/06/06 09:52

本文是观看斯坦福大学吴恩达老师的机器学习视频后的一些心得体会和总结，以及作业题中的关键代码，大家可以共同讨论进步。

1、机器学习的初步认识

机器学习，我自己的理解就是让计算机去模拟人类的行为方式，也就是人工智能的一个雏形？！（不知道理解的额对不对）总的先分为supervised learning和unsupervised learning两类，而其中前者又主要分为regression analysis和classification两类问题（连续与离散），后者的话主要是clustering algorithm问题（聚类算法），两者的区别是有无确定的输出（不知道这么说是否合适）。

2、线性回归的引入

我们的主要内容是回归分析（regression analysis）中的线性回归问题。

这里我们先看图。

这是我们房价与住房面积的一份数据，左边是面积（单位：feet^2），右边是价格（单位：1k美元），我们可以用坐标图画出来。

我们想做的一件事当然就是如何根据面积去预测房价，很显然，两者之间肯定是存在关系的，而这个关系就应该要根据我们手中有的这些数据源来得到，那么如何得到呢，这里就用到我们要讲的线性回归算法。

如下图，我们有一个数据集(dataset)，数据集有输入（面积）和输出（房价），那么我们可以假设有一个函数h，这个函数是未知的，把我们的数据集拿去训练，解出一个最优的函数，这就是一个线性回归问题。

3、线性回归算法实现

在前面，我们假设有一个函数 h 用于我们预测，由于此处我们只有一个输入，所以我们不妨设为h(x)=theta0+theta1x，theta0和theta1是两个未知参数，把这个函数画在坐标图上，我们可以得到的是一条曲线，我们需要把这个曲线进行拟合，使其接近我们我们的期望，那么怎么确定他是最优的呢，我们引入一个函数J。如下图

在matlab中代码实现如下

function J = computeCost(X, y, theta)% ====================== YOUR CODE HERE ======================% Instructions: Compute the cost of a particular choice of theta%               You should set J to the cost.J=1/(2*m)*(X*theta-y)'*(X*theta-y);

我们称为costFunction（代价函数），函数的自变量是theta，输出就是预测值与真实值差的平方和，我们要做的就是让这个函数最小化，然后这个是时候的theta就是我们所需要的，那么如何使J得到最小值呢，我们这里有两种方法，一种是梯度下降法。如下图：

因为J随着theta变化而变化（在此处是一个凸函数，就是最小值的周围都是单调地向最小值下降的），式子中我们把J对theta求导，然后再乘上alpha，就是从J函数的高处一步一步地走到低的地方，最终走到最低点然后此时导数为零，就不再移动了；其中，这个“步长”就是alpha控制的，alpha称为学习速率，过小会导致算法缓慢，过大则会导致代价函数值不能减小，反而随着迭代次数增大。

这里要注意不同的特征参数要同时更新。

matlab代码实现如下：

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)    % ====================== YOUR CODE HERE ======================    % Instructions: Perform a single gradient step on the parameter vector    %               theta.     %    % Hint: While debugging, it can be useful to print out the values    %       of the cost function (computeCost) and gradient here.    %    theta=theta-alpha/m*(X'*(X*theta-y));

通过梯度下降算法，即可得到我们的最优解。

另一种方法则更为简洁，称为normal equation，利用矩阵来进行求解（本人矩阵很渣，不作过多解释，直接贴代码吧）

matlab代码：

function [theta] = normalEqn(X, y)% ====================== YOUR CODE HERE ======================% Instructions: Complete the code to compute the closed form solution%               to linear regression and put the result in theta.%% ---------------------- Sample Solution ----------------------theta=pinv(X'*X)*X'*y;

以上是我们对于单输入的时候进行的线性回归求解，对于多输入的情况，我们的特征参数要相应增加，同时，我们还要注意是否存在高阶项的可能。

此外，在求解过程中我们会发现一些数据的数量级相差很大，如果直接去解需要消耗巨大的时间，这个时间我们可以进行归一化，即把大数都相应缩小，让输入量都处于一个数量级，这样可以大大地提高工作效率。

matlab代码：

function [X_norm, mu, sigma] = featureNormalize(X)%FEATURENORMALIZE Normalizes the features in X %   FEATURENORMALIZE(X) returns a normalized version of X where%   the mean value of each feature is 0 and the standard deviation%   is 1. This is often a good preprocessing step to do when%   working with learning algorithms.% You need to set these values correctlyX_norm = X;mu = zeros(1, size(X, 2));sigma = zeros(1, size(X, 2));% ====================== YOUR CODE HERE ======================% Instructions: First, for each feature dimension, compute the mean%               of the feature and subtract it from the dataset,%               storing the mean value in mu. Next, compute the %               standard deviation of each feature and divide%               each feature by it's standard deviation, storing%               the standard deviation in sigma. %%               Note that X is a matrix where each column is a %               feature and each row is an example. You need %               to perform the normalization separately for %               each feature. %% Hint: You might find the 'mean' and 'std' functions useful.%       mu = mean(X,1);  sigma = std(X);  i = 1;  times = size(X, 2);  while i <= times,      X_norm(:,i) = (X(:,i) - mu(1,i))/sigma(1,i);      i = i + 1;  end

这里用的公式是x=(x-mean)/std；

我们还有另一个公式是x=(x-mean)/(max-min)；

这里再贴一下课程作业ex1.m的全部代码（其他几个的核心代码前文贴出了）:

%% Machine Learning Online Class - Exercise 1: Linear Regression%  Instructions%  ------------% %  This file contains code that helps you get started on the%  linear exercise. You will need to complete the following functions %  in this exericse:%%     warmUpExercise.m%     plotData.m%     gradientDescent.m%     computeCost.m%     gradientDescentMulti.m%     computeCostMulti.m%     featureNormalize.m%     normalEqn.m%%  For this exercise, you will not need to change any code in this file,%  or any other files other than those mentioned above.%% x refers to the population size in 10,000s% y refers to the profit in $10,000s%%% Initializationclear ; close all; clc%% ==================== Part 1: Basic Function ====================% Complete warmUpExercise.m fprintf('Running warmUpExercise ... \n');fprintf('5x5 Identity Matrix: \n');warmUpExercise()fprintf('Program paused. Press enter to continue.\n');pause;%% ======================= Part 2: Plotting =======================fprintf('Plotting Data ...\n')data = load('ex1data1.txt');X = data(:, 1); y = data(:, 2);m = length(y); % number of training examples% Plot Data% Note: You have to complete the code in plotData.mplotData(X, y);fprintf('Program paused. Press enter to continue.\n');pause;%% =================== Part 3: Gradient descent ===================fprintf('Running Gradient Descent ...\n')X = [ones(m, 1), data(:,1)]; % Add a column of ones to xtheta = zeros(2, 1); % initialize fitting parameters% Some gradient descent settingsiterations = 1500;alpha = 0.01;% compute and display initial costcomputeCost(X, y, theta)% run gradient descenttheta = gradientDescent(X, y, theta, alpha, iterations);% print theta to screenfprintf('Theta found by gradient descent: ');fprintf('%f %f \n', theta(1), theta(2));% Plot the linear fithold on; % keep previous plot visibleplot(X(:,2), X*theta, '-')legend('Training data', 'Linear regression')hold off % don't overlay any more plots on this figure% Predict values for population sizes of 35,000 and 70,000predict1 = [1, 3.5] *theta;fprintf('For population = 35,000, we predict a profit of %f\n',...    predict1*10000);predict2 = [1, 7] * theta;fprintf('For population = 70,000, we predict a profit of %f\n',...    predict2*10000);fprintf('Program paused. Press enter to continue.\n');pause;%% ============= Part 4: Visualizing J(theta_0, theta_1) =============fprintf('Visualizing J(theta_0, theta_1) ...\n')% Grid over which we will calculate Jtheta0_vals = linspace(-10, 10, 100);theta1_vals = linspace(-1, 4, 100);% initialize J_vals to a matrix of 0'sJ_vals = zeros(length(theta0_vals), length(theta1_vals));% Fill out J_valsfor i = 1:length(theta0_vals)    for j = 1:length(theta1_vals)  t = [theta0_vals(i); theta1_vals(j)];      J_vals(i,j) = computeCost(X, y, t);    endend% Because of the way meshgrids work in the surf command, we need to % transpose J_vals before calling surf, or else the axes will be flippedJ_vals = J_vals';% Surface plotfigure;surf(theta0_vals, theta1_vals, J_vals)xlabel('\theta_0'); ylabel('\theta_1');% Contour plotfigure;% Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))xlabel('\theta_0'); ylabel('\theta_1');hold on;plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);

0 0