python 实现 softmax分类器（MNIST数据集）

来源：互联网发布：明星淘宝店铺大全编辑：程序博客网时间：2024/05/01 02:07

最近一直在外面，李航那本书没带在身上，所以那本书的算法实现估计要拖后了。
这几天在看Andrew Ng 机器学习的课程视频，正好看到了Softmax分类器那块，发现自己之前理解perceptron与logistic regression是有问题的。这两个算法真正核心的不同在于其分类函数的不同，perceptron采用一个分段函数作为分类器，logistic regression采用sigmod函数作为分类器，这才是这两个函数真正的不同。

废话不多说了，今天打算实现softmax分类器。

算法

算法参考的是Andrew 的课件与这篇文章。
具体实现的时候发现加入权重衰减效果会更好。

这里为了防止大家看不懂我的程序，我在这里做一些定义

∇ΘjJ(Θ)=−x(i)(1{y(i)=j}−p(y(i)=j|x(i);Θ))+λΘj(1)

p(y(i)=j|x(i);Θ)=eΘTjx(i)∑kl=1eΘTlx(i)(2)

eΘTlx(i)(3)

数据集

数据集和KNN那个博文用的是同样的数据集。
数据地址：https://github.com/WenDesi/lihang_book_algorithm/blob/master/data/train.csv

特征

将整个图作为特征

代码

代码已上传GitHub

这次的代码是python3的，有可能需要稍微改一改，不好意思了，我要背叛python2了。

# encoding=utf8import mathimport pandas as pdimport numpy as npimport randomimport timefrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_scoreclass Softmax(object):    def __init__(self):        self.learning_step = 0.000001           # 学习速率        self.max_iteration = 100000             # 最大迭代次数        self.weight_lambda = 0.01               # 衰退权重    def cal_e(self,x,l):        '''        计算博客中的公式3        '''        theta_l = self.w[l]        product = np.dot(theta_l,x)        return math.exp(product)    def cal_probability(self,x,j):        '''        计算博客中的公式2        '''        molecule = self.cal_e(x,j)        denominator = sum([self.cal_e(x,i) for i in range(self.k)])        return molecule/denominator    def cal_partial_derivative(self,x,y,j):        '''        计算博客中的公式1        '''        first = int(y==j)                           # 计算示性函数        second = self.cal_probability(x,j)          # 计算后面那个概率        return -x*(first-second) + self.weight_lambda*self.w[j]    def predict_(self, x):        result = np.dot(self.w,x)        row, column = result.shape        # 找最大值所在的列        _positon = np.argmax(result)        m, n = divmod(_positon, column)        return m    def train(self, features, labels):        self.k = len(set(labels))        self.w = np.zeros((self.k,len(features[0])+1))        time = 0        while time < self.max_iteration:            print('loop %d' % time)            time += 1            index = random.randint(0, len(labels) - 1)            x = features[index]            y = labels[index]            x = list(x)            x.append(1.0)            x = np.array(x)            derivatives = [self.cal_partial_derivative(x,y,j) for j in range(self.k)]            for j in range(self.k):                self.w[j] -= self.learning_step * derivatives[j]    def predict(self,features):        labels = []        for feature in features:            x = list(feature)            x.append(1)            x = np.matrix(x)            x = np.transpose(x)            labels.append(self.predict_(x))        return labelsif __name__ == '__main__':    print('Start read data')    time_1 = time.time()    raw_data = pd.read_csv('../data/train.csv', header=0)    data = raw_data.values    imgs = data[0::, 1::]    labels = data[::, 0]    # 选取 2/3 数据作为训练集， 1/3 数据作为测试集    train_features, test_features, train_labels, test_labels = train_test_split(        imgs, labels, test_size=0.33, random_state=23323)    # print train_features.shape    # print train_features.shape    time_2 = time.time()    print('read data cost '+ str(time_2 - time_1)+' second')    print('Start training')    p = Softmax()    p.train(train_features, train_labels)    time_3 = time.time()    print('training cost '+ str(time_3 - time_2)+' second')    print('Start predicting')    test_predict = p.predict(test_features)    time_4 = time.time()    print('predicting cost ' + str(time_4 - time_3) +' second')    score = accuracy_score(test_labels, test_predict)    print("The accruacy socre is " + str(score))

运行结果

这里写图片描述

速度挺快，正确率一般吧，比决策树之类的要高。

0 0