[blabla]a quick code about linear regression using gradient descent

来源:互联网 发布:美工设计岗位要求 编辑:程序博客网 时间:2024/05/22 01:36

use linear regression to predict the house price

the original data is from the cousera machine learning course.
when I finished the exercise with MATLAB, the idea about implementing the algorithm with python comes out.
so I’d like to refresh the knowledge and have fun with the data:)
the algorithm is so simple that you can scan it quickly, and save your time.:)

1. focus on the data

as a data scientist, what data do you have means how far can you deep into the superface of the data.
i nead to load data, and keep eyes on what scheme is the data stored.

import numpy as npdef load_data(filename):    data = []    with open(filename,'rb') as f:        for line in f:            line = line.decode('utf-8').strip().split(',')            data.append([int(_) for _ in line])    return datafilename = r'ex1data2.txt'data = load_data(filename)# look at the first three line of the dataprint('\n'.join([str(data[i]) for i in range(3)]))[2104, 3, 399900][1600, 3, 329900][2400, 3, 369000]

2. math model

so, what the three integers in the first line mean?
the first element 2104 means house width, the second element 3 means the house depth, and the last one means the price.
it is time to choose our math model to deal with the data.
apparently, the article pay attention to linear regression.
all right, the model is linear regression.

to find the parameters θ0,θ1,θ2 of hypothesis price=θ0+θ1x1+θ2x2

  1. initialize the vector θ=[θ0,θ1,θ2]

  2. minimize the error: error=0.5mmi=1(price(xi)yi))2

  3. to achieve the minimization we use the gradient descent algorithm due to the cost function is a convex function.

talk is cheap, show me the code.

3. implement

# normalizationdata = np.array(data)x = data[:,[0,1]]y = data[:,2]mu = np.mean(x, axis=0)std = np.std(x, axis=0)x = (x-mu)/stdrow = x.shape[0]X = np.ones((row,3))X[:,[1,2]] = xX = np.matrix(X)# get the X to computationtheta = np.zeros((3,1))theta = np.matrix(theta)y = np.matrix(y)#implement grad descent methoddef grad_descent(X, y, theta, iter_num, alpha):    m = len(y)    for _ in range(iter_num):        theta -= alpha/m*(X.T*X*theta-X.T*y.T)    return theta# initialize the parametersiter_num = 900alpha = 0.01new_theta = grad_descent(X, y, theta, iter_num, alpha)print('the theta parameter is:')print(new_theta)# Estimate the price of a 1650 sq-ft, 3 br houseprice = np.dot(np.array([1, (1650-mu[0])/std[0], (3-mu[1])/std[1]]), new_theta)print('for a 1650 sq-ft, 3 br house,the price is')print(price)the theta parameter is:[[ 340412.65957447] [ 109447.79646964] [  -6578.35485416]]for a 1650 sq-ft, 3 br house,the price is[[ 293081.4643349]]

3. Normal Euqation

when the number of featuers in data is below 1000.
we always use normal equation to compute theta.

what the relationship between these two methods?

θn+1=θnα/m(XXθXy)

when n becomes infinite the θn+1=θn and XXθXy=0

so θ=inv(XX)Xy

new_X = np.ones((47,3))new_X[:,1:] = data[:,:2]new_X = np.matrix(new_X)new_theta1 = np.linalg.pinv(new_X.T*new_X)*new_X.T*y.Tprint(new_theta1)[[ 89597.90954435] [   139.21067402] [ -8738.01911278]]new_price = np.dot(np.array([1, 1650, 3]), new_theta1)print(new_price)[[ 293081.46433506]]

the two result is close enough.

0 0