keras for attention

来源：互联网发布：拍照软件编辑：程序博客网时间：2024/06/04 00:22

keras还没有官方实现attention机制，有些attention的个人实现，在mnist数据集上做了下实验。模型是双向lstm+attention+dropout，话说双向lstm本身就很强大了。
参考链接:https://github.com/philipperemy/keras-attention-mechanism
https://github.com/keras-team/keras/issues/1472
环境:win10,py2.7,keras2+
代码如下：

# mnist attentionimport numpy as npnp.random.seed(1337)from keras.datasets import mnistfrom keras.utils import np_utilsfrom keras.layers import *from keras.models import *from keras.optimizers import AdamTIME_STEPS = 28INPUT_DIM = 28lstm_units = 64# data pre-processing(X_train, y_train), (X_test, y_test) = mnist.load_data('mnist.npz')X_train = X_train.reshape(-1, 28, 28) / 255.X_test = X_test.reshape(-1, 28, 28) / 255.y_train = np_utils.to_categorical(y_train, num_classes=10)y_test = np_utils.to_categorical(y_test, num_classes=10)print('X_train shape:', X_train.shape)print('X_test shape:', X_test.shape)# first way attentiondef attention_3d_block(inputs):    #input_dim = int(inputs.shape[2])    a = Permute((2, 1))(inputs)    a = Dense(TIME_STEPS, activation='softmax')(a)    a_probs = Permute((2, 1), name='attention_vec')(a)    #output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')    output_attention_mul = multiply([inputs, a_probs], name='attention_mul')    return output_attention_mul# build RNN model with attentioninputs = Input(shape=(TIME_STEPS, INPUT_DIM))drop1 = Dropout(0.3)(inputs)lstm_out = Bidirectional(LSTM(lstm_units, return_sequences=True), name='bilstm')(drop1)attention_mul = attention_3d_block(lstm_out)attention_flatten = Flatten()(attention_mul)drop2 = Dropout(0.3)(attention_flatten)output = Dense(10, activation='sigmoid')(drop2)model = Model(inputs=inputs, outputs=output)# second way attention# inputs = Input(shape=(TIME_STEPS, INPUT_DIM))# units = 32# activations = LSTM(units, return_sequences=True, name='lstm_layer')(inputs)## attention = Dense(1, activation='tanh')(activations)# attention = Flatten()(attention)# attention = Activation('softmax')(attention)# attention = RepeatVector(units)(attention)# attention = Permute([2, 1], name='attention_vec')(attention)# attention_mul = merge([activations, attention], mode='mul', name='attention_mul')# out_attention_mul = Flatten()(attention_mul)# output = Dense(10, activation='sigmoid')(out_attention_mul)# model = Model(inputs=inputs, outputs=output)model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])print(model.summary())print('Training------------')model.fit(X_train, y_train, epochs=10, batch_size=16)print('Testing--------------')loss, accuracy = model.evaluate(X_test, y_test)print('test loss:', loss)print('test accuracy:', accuracy)

结果:训练集上准确率98.43%，测试集上准确率98.95%。貌似没有过拟合，还可以接着训练。之前跑过tensorflow给的mnist例子，双向lstm也可以达到98%以上的准确率。

关于attention的博客：
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
https://www.cnblogs.com/shixiangwan/p/7573589.html
https://codekansas.github.io/blog/2016/language.html
https://distill.pub/2016/augmented-rnns/

论文：
《nerual machine translation by jointly learning to align and translate》
《show attend and tell : nerual image caption generation with visual attention》
《attention based bidirectional lstm for relation classification》
《hierarchical attention networks for document classification》
欢迎交流~

阅读全文

1 0