TensorFlow学习记录-- 6.百度warp-ctc 参数以及测试例子2解释

来源：互联网发布：手机淘宝如何申请试用编辑：程序博客网时间：2024/06/05 03:54

1 百度CTC

https://github.com/baidu-research/warp-ctc/blob/master/README.zh_cn.md
优点：速度快很多。。。

2 CTC详解

总的来说就是想不对齐标签，来设计一个loss，通过最小化这个loss，可以得到精确的识别效果(即最后还能在不对齐标签的情况下解码出来)，在语音识别方面效果和优势明显。
未完待续

3 解读百度warp-ctc参数以及例子

1 ctc函数

ctc(activations, flat_labels, label_lengths, input_lengths, blank_label=0)    Computes the CTC loss between a sequence of activations and a    ground truth labeling.    Args:        activations: A 3-D Tensor of floats.  The dimensions                     should be (t, n, a), where t is the time index, n                     is the minibatch index, and a indexes over                     activations for each symbol in the alphabet.        #这个相当于logits吧（rnn预测的输出）：在tensorflow中，相当于第一个是时间序列t，第二个为batch n，第三个为输入数据的维度a，一样的        flat_labels: A 1-D Tensor of ints, a concatenation of all the                     labels for the minibatch.        #labels是1-D的tensor，例如，对于俩个输入数据，他的label分别为1,2，那么1-D的label就可以记为[1,2],这是一个batch的，假如多个batch，也要把多个batch打平，假如俩个batch的label都为1,2，那么俩个batch的label应该写作[1,2,1,2]。        label_lengths: A 1-D Tensor of ints, the length of each label                       for each example in the minibatch.        #这个是每个minibatch中每个例子的每个label的长度，可能是因为所有label都连在一起了，不告诉label的长度就无法区分了吧？        input_lengths: A 1-D Tensor of ints, the number of time steps                       for each sequence in the minibatch.        #上面这个是输入长度，这是每个minibatch的每个序列的时间吗？        blank_label: int, the label value/index that the CTC                     calculation should use as the blank label    #返回每个minibatch每个例子？的cost。    Returns:        1-D float Tensor, the cost of each example in the minibatch        (as negative log probabilities).    * This class performs the softmax operation internally.    * The label reserved for the blank symbol should be label 0.

2 基础测试 _test_basic输入解读

        #开始activations维度为(2,5)         activations = np.array([            [0.1, 0.6, 0.1, 0.1, 0.1],            [0.1, 0.1, 0.6, 0.1, 0.1]            ], dtype=np.float32)        alphabet_size = 5        # dimensions should be t, n, p: (t timesteps, n minibatches,        # p prob of each alphabet). This is one instance, so expand        # dimensions in the middle        #现在activations维度为(2,1,5)，对应为(t,batch_size,dims)        activations = np.expand_dims(activations, 1)        #label        labels = np.asarray([1, 2], dtype=np.int32)        #每个minibatch中每个例子的每个label的长度        label_lengths = np.asarray([2], dtype=np.int32)        #输入的时间序列长度        input_lengths = np.asarray([2], dtype=np.int32)

3 多batch测试输入解读

        #开始activations维度为(2,5)        activations = np.array([            [0.1, 0.6, 0.1, 0.1, 0.1],            [0.1, 0.1, 0.6, 0.1, 0.1]        ], dtype=np.float32)        alphabet_size = 5        # dimensions should be t, n, p: (t timesteps, n minibatches,        # p prob of each alphabet). This is one instance, so expand        # dimensions in the middle        #现在activations维度为(2,1,5)，对应为(t,batch_size,dims)        _activations = np.expand_dims(activations, 1)        #现在activations维度为(2,2,5)，对应为(t,batch_size,dims)        activations = np.concatenate([_activations, _activations[...]], axis=1)        #flat labels        labels = np.asarray([1, 2, 1, 2], dtype=np.int32)        #每个minibatch中每个例子的每个label的长度，然后再组合起来        label_lengths = np.asarray([2, 2], dtype=np.int32)        #输入的时间序列长度，然后也再组合起来        input_lengths = np.asarray([2, 2], dtype=np.int32)

0 0

TensorFlow学习记录-- 6.百度warp-ctc 参数以及测试例子2解释

1 百度CTC

2 CTC详解

3 解读百度warp-ctc参数以及例子

1 ctc函数

2 基础测试 _test_basic输入解读

3 多batch测试 输入解读

3 多batch测试输入解读