使用TensorFlow实现TCDCN

来源：互联网发布：linux删除分区编辑：程序博客网时间：2024/06/10 18:37

本文使用TensorFlow实现了TCDCN网络结构，关于TCDCN可以阅读我的另一篇博客《Facial Landmark Detection by Deep Multi-task Learning》论文解读。

代码地址

GitHub上代码地址：

https://github.com/flyingzhao/tfTCDCN

代码解析

MainNet.py定义了网络的结构。

网络结构

在网络结构的定义中，首先模仿官方教程中封装变量初始化操作、卷积操作和池化操作。

def weight_variable(shape):  #权重    initial = tf.truncated_normal(shape, stddev=0.1)    return tf.Variable(initial)def bias_variable(shape):  #偏置    initial = tf.constant(0.1, shape=shape)    return tf.Variable(initial)def conv2d(x, W):   #卷积    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='VALID')def max_pool_2x2(x):  #池化    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],                          strides=[1, 2, 2, 1], padding='VALID')

需要注意的是，这篇论文中所有的卷积和池化操作都是VALID模式，也就是不对边缘进行填充，最后得到的每层大小都比原来小一点。
然后进入网络结构的正式定义。首先定义一些placeholder，这些值是网络的输入值，输入的值中image是训练的图片，landmark是特征点的坐标，gender、smile、glasses、headpose是其他的辅助任务：

image = tf.placeholder(tf.float32, shape=[None, 40, 40])landmark = tf.placeholder(tf.float32, shape=[None, 10])gender = tf.placeholder(tf.float32, shape=[None, 2])smile = tf.placeholder(tf.float32, shape=[None, 2])glasses = tf.placeholder(tf.float32, shape=[None, 2])headpose = tf.placeholder(tf.float32, shape=[None, 5])

定义各层网络结构，第1-4层为卷积层，第5层为全连接层，在全连接层需要进行dropout操作，激活函数使用了|tanh|：

# layer 1  卷积层W_conv1 = weight_variable([5, 5, 1, 16])b_conv1 = bias_variable([16])x_image = tf.reshape(image, [-1, 40, 40, 1])h_conv1 = tf.abs(tf.nn.tanh(conv2d(x_image, W_conv1) + b_conv1))h_pool1 = max_pool_2x2(h_conv1)# layer 2  卷积层W_conv2 = weight_variable([3, 3, 16, 48])b_conv2 = bias_variable([48])h_conv2 = tf.abs(tf.nn.tanh(conv2d(h_pool1, W_conv2) + b_conv2))h_pool2 = max_pool_2x2(h_conv2)# layer 3  卷积层W_conv3 = weight_variable([3, 3, 48, 64])b_conv3 = bias_variable([64])h_conv3 = tf.abs(tf.nn.tanh(conv2d(h_pool2, W_conv3) + b_conv3))h_pool3 = max_pool_2x2(h_conv3)# layer 4  卷积层，没有池化W_conv4 = weight_variable([2, 2, 64, 64])b_conv4 = bias_variable([64])h_conv4 = tf.abs(tf.nn.tanh(conv2d(h_pool3, W_conv4) + b_conv4))h_pool4 = h_conv4# layer 5  全连接层W_fc1 = weight_variable([2 * 2 * 64, 100])b_fc1 = bias_variable([100])h_pool4_flat = tf.reshape(h_pool4, [-1, 2 * 2 * 64])h_fc1 = tf.nn.tanh(tf.matmul(h_pool4_flat, W_fc1) + b_fc1)keep_prob = tf.placeholder(tf.float32)h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)  #dropout

主要的网络结构定义完成以后，就需要针对各项具体任务进行具体操作了，特征点检测是回归模型，而其它任务都是分类模型：

# readout layer# landmarkW_fc_landmark = weight_variable([100, 10])b_fc_landmark = bias_variable([10])y_landmark = tf.matmul(h_fc1_drop, W_fc_landmark) + b_fc_landmark# genderW_fc_gender = weight_variable([100, 2])b_fc_gender = bias_variable([2])y_gender = tf.matmul(h_fc1_drop, W_fc_gender) + b_fc_gender# smileW_fc_smile = weight_variable([100, 2])b_fc_smile = bias_variable([2])y_smile = tf.matmul(h_fc1_drop, W_fc_smile) + b_fc_smile# glassesW_fc_glasses = weight_variable([100, 2])b_fc_glasses = bias_variable([2])y_glasses = tf.matmul(h_fc1_drop, W_fc_glasses) + b_fc_glasses# headposeW_fc_headpose = weight_variable([100, 5])b_fc_headpose = bias_variable([5])y_headpose = tf.matmul(h_fc1_drop, W_fc_headpose) + b_fc_headpose

定义损失函数。特征点检测是平方和损失函数，而分类操作是交叉熵损失，将这些损失函数按照论文里面加起来，并按照文章要求在最后一层的权重上增加l2正则项：

error = 1 / 2 * tf.reduce_sum(tf.square(landmark - y_landmark)) + \        tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(y_gender, gender)) + \        tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(y_smile, smile)) + \        tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(y_glasses, glasses)) + \        tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(y_headpose, headpose))+\        tf.nn.l2_loss(W_fc_landmark)+\        tf.nn.l2_loss(W_fc_glasses)+\        tf.nn.l2_loss(W_fc_gender)+\        tf.nn.l2_loss(W_fc_headpose)+\        tf.nn.l2_loss(W_fc_smile)

定义训练方式。这里使用了Adam进行优化：

# traintrain_step = tf.train.AdamOptimizer(1e-4).minimize(error)

结果

按照上面定义的网络就可以进行训练了：

sess.run(train_step,                 feed_dict={image: i, landmark: j, gender: k, smile: l, glasses: m, headpose: n, keep_prob: 0.5})

最终的训练结果和原文有一定的差距，主要原因在于：
（1）作者给出的训练集数据集需要进一步进行处理，需要进行人脸检测等预处理
（2）作者没有公布几个超参数的选择，不同任务的权重不能很好的确定，在task-wise early stopping时候也不能很好的确定时机。

0 0