Theano-Deep Learning Tutorials 笔记：Modeling and generating sequences of polyphonic music with the RNN

来源：互联网发布：淘宝哪家的男牛仔裤好编辑：程序博客网时间：2024/06/06 10:38

教程地址：http://www.deeplearning.net/tutorial/rnnrbm.html#rnnrbm

代码，数据集，论文见教程。

The RNN-RBM

RNN-RBM也是能量模型，用于对时间序列的密度估计，在 time step t 的特征向量 $v^{(t)}$ 为高维向量。

它可以描述多峰的条件概率分布 $v^{(t)}|\mathcal A^{(t)}$ , where $\mathcal A^{(t)}\equiv \{v_\tau|\tau<t\}$ 。

$\mathcal A^{(t)}\equiv \{v_\tau|\tau<t\}$ 表示在 time t 时刻，历史序列（t 之前所有）。

每一个 time step 就是一个 RBM，而RBM的参数 $b_v^{(t)},b_h^{(t)}$ 又由 RNN（隐藏层为 $u^{(t)}$ ）决定。

$b_v^{(t)} = b_v + W_{uv} u^{(t-1)}$ (1)

$b_h^{(t)} = b_h + W_{uh} u^{(t-1)}$ (2)

RNN隐藏层表示为：

$u^{(t)} = \tanh (b_u + W_{uu} u^{(t-1)} + W_{vu} v^{(t)})$ (3)

最终模型图示：

The overall probability distribution is given by the sum over the $T$ time stepsin agiven sequence:

$P(\{v^{(t)}\}) = \sum_{t=1}^T P(v^{(t)} | \mathcal A^{(t)})$ (4)

where the right-hand side multiplicand is the marginalized probability of the $t^\mathrm{th}$ RBM.

Implementation

教程实现了两个函数：一个训练RNN-RBM，另一个生成采样序列。

训练时，有 $\{v^{(t)}\}$ , RNN隐藏层 $\{u^{(t)}\}$ and 参数 $\{b_v^{(t)}, b_h^{(t)}\}$ （可以计算出）。参数更新为SGD随机梯度下降，和RBM训练类似，使用 contrastive divergence （CD）算法。

序列的生成和RNN相似，只是 $v^{(t)}$ 在每个time step 都需要按RBM中的Gibbs采样得出。

The RBM layer

函数 build_rbm 建立RBM部分（图例中上部）Gibbs链，输入为 mini-batch(a binary matrix)；输入也可以不是 mini-batch（也可以说当mini-batch为1时），a binary vector。

def build_rbm(v, W, bv, bh, k):    '''Construct a k-step Gibbs chain starting at v for an RBM.    v : Theano vector or matrix        If a matrix, multiple chains will be run in parallel (batch).    W : Theano matrix        Weight matrix of the RBM.    bv : Theano vector        Visible bias vector of the RBM.    bh : Theano vector        Hidden bias vector of the RBM.    k : scalar or Theano scalar        Length of the Gibbs chain.    Return a (v_sample, cost, monitor, updates) tuple:    v_sample : Theano vector or matrix with the same shape as `v`        Corresponds to the generated sample(s).    cost : Theano scalar        Expression whose gradient with respect to W, bv, bh is the CD-k        approximation to the log-likelihood of `v` (training example) under the        RBM. The cost is averaged in the batch case.    monitor: Theano scalar        Pseudo log-likelihood (also averaged in the batch case).    updates: dictionary of Theano variable -> Theano variable        The `updates` object returned by scan.'''    def gibbs_step(v):        mean_h = T.nnet.sigmoid(T.dot(v, W) + bh)        h = rng.binomial(size=mean_h.shape, n=1, p=mean_h,                         dtype=theano.config.floatX)        mean_v = T.nnet.sigmoid(T.dot(h, W.T) + bv)        v = rng.binomial(size=mean_v.shape, n=1, p=mean_v,                         dtype=theano.config.floatX)        return mean_v, v    chain, updates = theano.scan(lambda v: gibbs_step(v)[1], outputs_info=[v],                                 n_steps=k)    v_sample = chain[-1]    mean_v = gibbs_step(v_sample)[0]    monitor = T.xlogx.xlogy0(v, mean_v) + T.xlogx.xlogy0(1 - v, 1 - mean_v)    monitor = monitor.sum() / v.shape[0]    def free_energy(v):        return -(v * bv).sum() - T.log(1 + T.exp(T.dot(v, W) + bh)).sum()    cost = (free_energy(v) - free_energy(v_sample)) / v.shape[0]    return v_sample, cost, monitor, updates

The RNN layer

函数 build_rnnrbm 融合RNN和RBM，关系如上文图例。

RNN部分（图例中下部）在训练时： $v^{(t)}$ 已知，RNN的训练不需要RBM的参数，先把RNN中隐藏层 u0到uT 按公式（3）计算出来，再把 T 个RBM 一次性构建出来。

模型训练完成后：RNN和RBM互相影响， $v^{(t)}$ 需在每个 time step t 通过 t 时刻的 RBM Gibbs采样得到，从而计算 t 时刻的RNN隐藏层 u以及后续 time step 的RBM（公式（2，3））和RNN。

def build_rnnrbm(n_visible, n_hidden, n_hidden_recurrent):    '''Construct a symbolic RNN-RBM and initialize parameters.    n_visible : integer        Number of visible units.    n_hidden : integer        Number of hidden units of the conditional RBMs.    n_hidden_recurrent : integer        Number of hidden units of the RNN.    Return a (v, v_sample, cost, monitor, params, updates_train, v_t,    updates_generate) tuple:    v : Theano matrix        Symbolic variable holding an input sequence (used during training)    v_sample : Theano matrix        Symbolic variable holding the negative particles for CD log-likelihood        gradient estimation (used during training)    cost : Theano scalar        Expression whose gradient (considering v_sample constant) corresponds        to the LL gradient of the RNN-RBM (used during training)    monitor : Theano scalar        Frame-level pseudo-likelihood (useful for monitoring during training)    params : tuple of Theano shared variables        The parameters of the model to be optimized during training.    updates_train : dictionary of Theano variable -> Theano variable        Update object that should be passed to theano.function when compiling        the training function.    v_t : Theano matrix        Symbolic variable holding a generated sequence (used during sampling)    updates_generate : dictionary of Theano variable -> Theano variable        Update object that should be passed to theano.function when compiling        the generation function.'''    W = shared_normal(n_visible, n_hidden, 0.01)    bv = shared_zeros(n_visible)    bh = shared_zeros(n_hidden)    Wuh = shared_normal(n_hidden_recurrent, n_hidden, 0.0001)    Wuv = shared_normal(n_hidden_recurrent, n_visible, 0.0001)    Wvu = shared_normal(n_visible, n_hidden_recurrent, 0.0001)    Wuu = shared_normal(n_hidden_recurrent, n_hidden_recurrent, 0.0001)    bu = shared_zeros(n_hidden_recurrent)    params = W, bv, bh, Wuh, Wuv, Wvu, Wuu, bu  # learned parameters as shared                                                # variables    v = T.matrix()  # a training sequence    u0 = T.zeros((n_hidden_recurrent,))  # initial value for the RNN hidden                                         # units    # If `v_t` is given, deterministic recurrence to compute the variable    # biases bv_t, bh_t at each time step. If `v_t` is None, same recurrence    # but with a separate Gibbs chain at each time step to sample (generate)    # from the RNN-RBM. The resulting sample v_t is returned in order to be    # passed down to the sequence history.    def recurrence(v_t, u_tm1):        bv_t = bv + T.dot(u_tm1, Wuv)        bh_t = bh + T.dot(u_tm1, Wuh)        generate = v_t is None        if generate:            v_t, _, _, updates = build_rbm(T.zeros((n_visible,)), W, bv_t,                                           bh_t, k=25)        u_t = T.tanh(bu + T.dot(v_t, Wvu) + T.dot(u_tm1, Wuu))        return ([v_t, u_t], updates) if generate else [u_t, bv_t, bh_t]    # For training, the deterministic recurrence is used to compute all the    # {bv_t, bh_t, 1 <= t <= T} given v. Conditional RBMs can then be trained    # in batches using those parameters.    (u_t, bv_t, bh_t), updates_train = theano.scan(        lambda v_t, u_tm1, *_: recurrence(v_t, u_tm1),        sequences=v, outputs_info=[u0, None, None], non_sequences=params)    v_sample, cost, monitor, updates_rbm = build_rbm(v, W, bv_t[:], bh_t[:],                                                     k=15)    updates_train.update(updates_rbm)    # symbolic loop for sequence generation    (v_t, u_t), updates_generate = theano.scan(        lambda u_tm1, *_: recurrence(None, u_tm1),        outputs_info=[None, u0], non_sequences=params, n_steps=200)    return (v, v_sample, cost, monitor, params, updates_train, v_t,            updates_generate)

Putting it all together

class RnnRbm:    '''Simple class to train an RNN-RBM from MIDI files and to generate sample    sequences.'''    def __init__(        self,        n_hidden=150,        n_hidden_recurrent=100,        lr=0.001,        r=(21, 109),        dt=0.3    ):        '''Constructs and compiles Theano functions for training and sequence        generation.        n_hidden : integer            Number of hidden units of the conditional RBMs.        n_hidden_recurrent : integer            Number of hidden units of the RNN.        lr : float            Learning rate        r : (integer, integer) tuple            Specifies the pitch range of the piano-roll in MIDI note numbers,            including r[0] but not r[1], such that r[1]-r[0] is the number of            visible units of the RBM at a given time step. The default (21,            109) corresponds to the full range of piano (88 notes).        dt : float            Sampling period when converting the MIDI files into piano-rolls, or            equivalently the time difference between consecutive time steps.'''        self.r = r        self.dt = dt        (v, v_sample, cost, monitor, params, updates_train, v_t,            updates_generate) = build_rnnrbm(                r[1] - r[0],                n_hidden,                n_hidden_recurrent            )        gradient = T.grad(cost, params, consider_constant=[v_sample])        updates_train.update(            ((p, p - lr * g) for p, g in zip(params, gradient))        )        self.train_function = theano.function(            [v],            monitor,            updates=updates_train        )        self.generate_function = theano.function(            [],            v_t,            updates=updates_generate        )    def train(self, files, batch_size=100, num_epochs=200):        '''Train the RNN-RBM via stochastic gradient descent (SGD) using MIDI        files converted to piano-rolls.        files : list of strings            List of MIDI files that will be loaded as piano-rolls for training.        batch_size : integer            Training sequences will be split into subsequences of at most this            size before applying the SGD updates.        num_epochs : integer            Number of epochs (pass over the training set) performed. The user            can safely interrupt training with Ctrl+C at any time.'''        assert len(files) > 0, 'Training set is empty!' \                               ' (did you download the data files?)'        dataset = [midiread(f, self.r,                            self.dt).piano_roll.astype(theano.config.floatX)                   for f in files]        try:            for epoch in range(num_epochs):                numpy.random.shuffle(dataset)                costs = []                for s, sequence in enumerate(dataset):                    for i in range(0, len(sequence), batch_size):                        cost = self.train_function(sequence[i:i + batch_size])                        costs.append(cost)                print('Epoch %i/%i' % (epoch + 1, num_epochs))                print(numpy.mean(costs))                sys.stdout.flush()        except KeyboardInterrupt:            print('Interrupted by user.')    def generate(self, filename, show=True):        '''Generate a sample sequence, plot the resulting piano-roll and save        it as a MIDI file.        filename : string            A MIDI file will be created at this location.        show : boolean            If True, a piano-roll of the generated sequence will be shown.'''        piano_roll = self.generate_function()        midiwrite(filename, piano_roll, self.r, self.dt)        if show:            extent = (0, self.dt * len(piano_roll)) + self.r            pylab.figure()            pylab.imshow(piano_roll.T, origin='lower', aspect='auto',                         interpolation='nearest', cmap=pylab.cm.gray_r,                         extent=extent)            pylab.xlabel('time (s)')            pylab.ylabel('MIDI note number')            pylab.title('generated piano-roll')

Results

在Nottingham数据集上运行 200 epochs，训练大约 24 小时。

The figures below show the piano-rolls of two sample sequences and we provide the corresponding MIDI files:

_images/sample1.png

Listen to sample1.mid

_images/sample2.png

Listen tosample2.mid

感觉教程对数据集介绍不太清楚，不太知道输入输出是啥，下面介绍下：

不难看出 piano-rolls 为输入，为持续60s的序列（对应 time step）样本，黑色表示1，白色表示0，对应文中提到的输入为 binary ，看下图应该很容易知道 piano-rolls 具体是啥了。

Nottingham数据集中每个样本为一个行数为150左右（代表150个time step），列数为88（代表21-109的midi note number，对应钢琴的音域A0到C8，可以说音域吗。。。）的矩阵，代表一段曲子的 piano-rolls，矩阵元素都为0或1。

目的是通过序列piano-rolls 预测后续 piano-rolls。

MIDI note number 60 就是"Middle C"或C5。就是唱歌弹琴乐谱啥的里面那个音高的数值化，范围大概是0到127（钢琴是21-109），数值越大音调越高。

0 0