python 实现斯坦福机器学习ex2.2
来源:互联网 发布:去影楼做网络销售 编辑:程序博客网 时间:2024/05/08 05:15
1、与ex2.1不同的是,ex2.2拥有多个特征值(>2)在没有进行主成分分析的情况下,每一个特征值的权重都加上了惩罚因子。
2、在标准化权重的时候,theta0 不参与标准化。
# Machine Learning Online Class - Exercise 2: Logistic Regressionimport numpy as npimport matplotlib.pyplot as pltimport scipy.optimize as scopdef loadData(filename): fr = open(filename) arrayLines = fr.readlines() numberOfLines = len(arrayLines) x = np.zeros((numberOfLines, 2)) y = np.zeros((numberOfLines, 1)) index = 0 for line in arrayLines: line = line.strip() listFormLine = line.split(',') x[index, :] = listFormLine[:2] y[index] = listFormLine[-1] index += 1 return x, y, numberOfLinesdef plotData(x, y): f2 = plt.figure(2) idx_1 = np.where(y == 0) p1 = plt.scatter(x[idx_1, 0], x[idx_1, 1], color='y', label='1', s=30) idx_2 = np.where(y == 1) p2 = plt.scatter(x[idx_2, 0], x[idx_2, 1], marker='+', color='c', label='0', s=50) plt.xlabel('Microchip Test 1') plt.ylabel('Microchip Test 2') plt.legend(loc='upper right') # plt.show() return pltdef mapFeature(x1, x2): degree = 6 out = np.ones((x1.shape[0], 1)) for i in range(1, degree+1): for j in range(0, i+1): # print((x1 ** (i - j))*(x2 ** j)) newColumn = (x1 ** (i - j))*(x2 ** j) out = np.column_stack((out, newColumn)) return outdef mapFeature1(x1, x2): degree = 6 out = np.ones((1, 1)) for i in range(1, degree+1): for j in range(0, i+1): newColumn = (x1 ** (i - j))*(x2 ** j) out = np.column_stack((out, newColumn)) return outdef sigmod(z): g = np.zeros(z.shape) # 在python 中,math.log()函数不能对矩阵直接进行操作 # http://blog.csdn.net/u013634684/article/details/49305655 g = 1 / (1 + np.exp(-z)) return gdef costFunctionReg(theta, X, y, myLambda ): m, n = X.shape theta = theta.reshape((n, 1)) theta1 = theta.copy() theta1[0] = 0 y = y.reshape((m, 1)) s1 = np.log(sigmod(np.dot(X, theta))) s2 = np.log(1 - sigmod(np.dot(X, theta))) s1 = s1.reshape((m, 1)) s2 = s2.reshape((m, 1)) s = y * s1 + (1 - y) * s2 J = -(np.sum(s)) / m + myLambda/(2*m) * (theta1.T).dot(theta1) # print('theta1 in J:', theta1) # print('theta in J:', theta) # print('J:', J) return Jdef Gradient(theta, X, y, myLambda): m, n = X.shape theta = theta.reshape((n, 1)) theta1 = theta.copy() theta1[0] = 0 y = y.reshape((m, 1)) grad = ((X.T).dot(sigmod(np.dot(X, theta)) - y)) / m + myLambda/m * theta1 # print('theta in grad:', theta) # print('grad:', grad) return grad.flatten()def plotDecisionBoundary(theta, X, y): f2 = plotData(X[:, 1:], y) m, n = X.shape if n <= 3: # Only need 2 points to define a line, so choose two endpoints minVals = X[:, 1].min(0)-2 maxVals = X[:, 1].max(0)+2 plot_x = np.array([minVals, maxVals]) plot_y = (-1 / theta[2]) * (plot_x.dot(theta[1]) + theta[0]) f2.plot(plot_x, plot_y, label="Test Data", color='b') plt.show() else: # Here is the grid range u = np.linspace(-1, 1.5, 50) v = np.linspace(-1, 1.5, 50) # theta = theta.reshape((m, 1)) z = np.zeros((len(u), len(v))) for i in range(len(u)): for j in range(len(v)): z[i, j] = mapFeature1(u[i], v[j]).dot(theta) z = z.T plt.contour(u, v, z) plt.show()if __name__ == '__main__': x, y, numberOfLines = loadData('ex2data2.txt') plt = plotData(x, y) # plt.show() # == == == == == = Part 1: Regularized Logistic Regression == == == == == == # given a dataset with data points that are not linearly separable. # add polynomial features to our data matrix(similar to polynomial regression) # Add polynomial Features # Note that mapFeature also adds a column of ones for us, so the intercept term is handled X = mapFeature(x[:, 0], x[:, 1]) # Initialize fitting parameters mRow, nColumn = X.shape initial_theta = np.zeros((nColumn, 1)) myLambda = 1 # Compute and display initial cost and gradient for regularized logistic regression # 在未做主成分分析的情况下,为了防止过拟合的问题,需要正则化theta,但是通常要剔除theta0 cost = costFunctionReg(initial_theta, X, y, myLambda) grad = Gradient(initial_theta, X, y, myLambda) print('Cost at initial theta (zeros):', cost) print('grad at initial theta (zeros):', grad) # Compute and display cost and gradient with non - zero theta test_theta = np.ones((nColumn, 1)) cost = costFunctionReg(test_theta, X, y, myLambda) grad = Gradient(test_theta, X, y, myLambda) print('Cost at test theta (zeros):', cost) print('grad at test theta (zeros):', grad) # == == == == == == = Part 2: Regularization and Accuracies == == == == == == = initial_theta = np.zeros((nColumn, 1)) myLambda = 1 Result = scop.minimize(fun=costFunctionReg, x0=initial_theta, args=(X, y, myLambda), method='TNC', jac=Gradient) optimalTheta = Result.x print('Cost at theta found by fminunc:', Result.fun) print('theta: \n', optimalTheta) # == == == == == = Part 3: Plotting the Decision Boundary == == == == == == plotDecisionBoundary(optimalTheta, X, y)
阅读全文
0 0
- python 实现斯坦福机器学习ex2.2
- 斯坦福 机器学习 ex2
- 斯坦福机器学习 ex1 Python实现
- python实现斯坦福机器学习ex1.1
- python 实现斯坦福机器学习实验1.2
- python 实现斯坦福机器学习实验2.1
- 斯坦福机器学习课程神经网络作业的Python实现
- 斯坦福机器学习-week 2 学习笔记
- 斯坦福机器学习2:监督学习应用
- python ex2
- 斯坦福机器学习笔记-Lecture 1,2
- 斯坦福机器学习公开课(2)
- 斯坦福机器学习1
- 斯坦福机器学习课程
- 斯坦福机器学习
- 斯坦福 机器学习 ex1
- 斯坦福机器学习记录
- Andrew Ng coursera上的《机器学习》ex2
- Linux入门—20170613
- Kotlin开发之旅《一》— 初学者Kotlin基础必备
- Fisher线性判别
- SE02 Unit02 基本IO操作 、 文本数据IO操作
- Session Cursor 的相关参数
- python 实现斯坦福机器学习ex2.2
- 常用C#字符串函数大全
- Webpack 入门指南
- 数据分析可视化入门知识点总结
- Linux 信号signal(一)
- js画直线
- linux检查磁盘空间使用情况df 命令
- LeetCood 200. Number of Islands floodfill算法,回溯算法
- java代码运行LDA JGibbLDA运行配置方法(MyEclipse)