theano linear regression exercise(theano 线性回归练习)

来源：互联网发布：java和c哪个难编辑：程序博客网时间：2024/06/03 13:09

最近学习theano工具包做deep learning，这个包最令人激动的是自动导数计算，你给出符号化的公式之后，能自动生成导数，这是最吸引我的地方之一。最重要的是能对用户透明的使用CPU or GPU加速，单纯的用CPU（6核）加速最少提高了10倍，我的显卡太差了，所以它默认用了CPU。比起matlab整deep learning我觉得这个库非常的有市场。

环境说明，因为要用到g++作为底层编译GPU/CPU加速的代码，所以自己手动的部署整个环境非常的麻烦，我用的academic licence 的Enthought canopy，几乎只要这一个就能把所有的依赖包都加进去，不过他用的theano版本可能比较老会出现各种问题，你可以自己用pip install theano自己安装一个最新的，这样问题减少很多（python2.7）。

线性回归，应该是作为使用这个工具包，感受其调试执行过程的最重要的第一步，当然，目前就我所知，我还没找到一个这样的练习，本例也是我自己设计的，十分简单，但是却是一个机器学习程序的整个完整框架，大部分的思路来源于theano tutorial 这里是链接 theano tutorial （2014/7/23）

整个theano 工具包的核心是tensor 库，作为一个符号化系统库，本身有很多有趣的机制，比如graphic 你要理解整个求导的一个机制，就要看这个，然后就是theano.function这个伟大的函数了，有了导数，只是有了符号化的导数，要计算，要梯度下降就需要连接理想与现实的函数theano.function

这里介绍下几个重要的东西，一个是theano.shared()函数，这个函数的厉害之处在于，第一，返回tensor类型的变量，第二，生成的变量类似于全局变量在几个函数间共享使用。要操作好theano,function 这两点必须要有深刻的认识。下面是theano.function函数签名

function.function(inputs, outputs, mode=None, updates=None, givens=None, no_default_updates=False, accept_inplace=False,

这里的updates的目标就需要shared变量，givens的数据源若是通过下标索引，也只能是shared变量（这里，我个人得出的结论，可以用，但也许是错的），比如样本是动态的更新来更新参数的时候这点特别重要。

下面贴出整个代码：

# -*- coding: utf-8 -*-import numpy as npimport theano.tensor as Timport theanoimport timeclass Linear_Reg(object):    def __init__(self,x):        self.a = theano.shared(value = np.zeros((1,),             dtype=theano.config.floatX),name = 'a')        self.b = theano.shared(value = np.zeros((1,),             dtype=theano.config.floatX),name = 'b')        self.result = self.a * x + self.b        self.params = [self.a,self.b]    def msl(self,y):        return T.mean((y - self.result)**2)def run():        rate = 0.01        data = np.linspace(1,10,10)        # y = 3 * x + 1 最后的random是加了一些随机的噪声 不然求出来的回归毫无意义        labels = data * 3 + np.ones(data.shape[0],dtype=np.float64) + np.random.rand(data.shape[0])        print labels        X = theano.shared(np.asarray(data,                                         dtype=theano.config.floatX),borrow = True)        Y = theano.shared(np.asarray(labels,                                          dtype=theano.config.floatX),borrow = True)        index = T.lscalar()        x = T.dscalar('x')        y = T.dscalar('y')                reg = Linear_Reg(x = x)        cost = reg.msl(y)                a_g = T.grad(cost = cost,wrt = reg.a)        b_g = T.grad(cost = cost, wrt = reg.b)        updates=[(reg.a,reg.a - rate * a_g),(reg.b,reg.b - rate * b_g)]        train_model = theano.function(inputs=[index],                                   outputs = reg.msl(y),                                    updates = updates,                                    givens = {                                        x:X[index],                                        y:Y[index]                                       }                                      )                    done = True        err = 0.0        count = 0        last = 0.0        start_time = time.clock()        while done:            err_s = [train_model(i) for i in xrange(data.shape[0])]            err = np.mean(err_s)                        #print err            count = count + 1            if count > 10000 or err <0.1:                done = False            last = err        end_time = time.clock()        print 'Total time is ：',end_time -start_time,' s' # 5.12s        print 'last error :',err        print 'a value : ',reg.a.get_value() #  [ 2.92394467]         print 'b value : ',reg.b.get_value() # [ 1.81334458]       run()

基本思想是每一次取一个点，然后梯度下降，严格上样本点最好应该是随机取的，但是为了实验方便按顺序取，迭代这么多次，只是为了保证收敛。本程序更大的作用是作为示例，这么套框架，这样是可以的，在使用function中如果用类似的处理办法，就可以分析问题所在了，贴出代码的初衷也是给自己一个今后的模版，省事很多！刚学theano，有些拙劣的意见，还请指教！

0 0