TensorFlow学习笔记11——《面向机器智能的tensorflow实践》第5.5节Stanford Dogs例程实现

来源：互联网发布：楼市泡沫知乎编辑：程序博客网时间：2024/06/15 00:28

《面向机器智能的tensorflow实践》书中第5.5节的Stanford Dogs例程是入门TensorFlow的良好范例。但该书及其github中均没有提供完整代码，并且书中的部分代码因TensorFlow版本原因也存在无法运行的情况。作为TensorFlow小白，经过近2周的学习，终于实现了该节的完整代码，与大家分享。
本人运行的环境是：win7X64+Anaconda1.6.3+Spyder3.2.1+tensorflow1.1.0。将代码分为写tfrecord文件和训练测试模型两部分。大部分代码的解释在该书均可找到，我根据自己的理解又添加了部分注释。
1. 写tfrecord文件
代码如下：

# -*- coding: utf-8 -*-"""Created on Thu Nov  2 09:29:57 2017@author: XM读取Stanford Dog中的文件，并转换为tensorflow record格式，是进行训练的第1步"""import tensorflow as tfimport globfrom itertools import groupbyfrom collections import defaultdictsess = tf.InteractiveSession()#查找符合一定规则的所有文件，并将文件名以lis形式返回。image_filenames_0 = glob.glob("./imagenet-dogs/n02*/*.jpg")#这句是我添加的。因为读到的路径形式为：'./imagenet-dogs\\n02085620-Chihuahua\\n02085620_10074.jpg'，路径分隔符中除第1个之外，都是2个反斜杠，与例程不一致。这里将2个反斜杠替换为斜杠image_filenames = list(    map(lambda image: image.replace('\\', '/'), image_filenames_0))#用list类型初始化training和testing数据集，用defaultdict的好处是为字典中不存在的键提供默认值training_dataset = defaultdict(list)testing_dataset = defaultdict(list)#将品种名从文件名中切分出，image_filename_with_breed是一个迭代器，用list(image_filename_with_breed)将其转换为list，其中的元素类似于：('n02085620-Chihuahua', './imagenet-dogs/n02085620-Chihuahua/n02085620_10131.jpg')。image_filename_with_breed = map(    lambda filename: (filename.split("/")[2], filename), image_filenames)## Group each image by the breed which is the 0th element in the tuple returned above#groupby后得到的是一个迭代器，每个元素的形式为：('n02085620-Chihuahua', <itertools._grouper at 0xd5892e8>)，其中第1个元素为种类；第2个元素代表该类的文件，这两个元素也分别对应for循环里的dog_breed和breed_images。for dog_breed, breed_images in groupby(image_filename_with_breed,                                       lambda x: x[0]):    #enumerate的作用是列举breed_images中的所有元素，可同时返回索引和元素，i和breed_image    #的最后一个值分别是：168、('n02116738-African_hunting_dog', './imagenet-dogs/    #n02116738-African_hunting_dog/n02116738_9924.jpg')    for i, breed_image in enumerate(breed_images):        #因为breed_images是按类分别存储的，所以下面是将大约20%的数据作为测试集，大约80%的        #数据作为训练集。        #testing_dataset和training_dataset是两个字典，testing_dataset中        #的第一个元素是 'n02085620-Chihuahua': ['./imagenet-dogs/n02085620-Chihuahua/        #n02085620_10074.jpg', './imagenet-dogs/n02085620-Chihuahua/        #n02085620_11140.jpg',.....]        if i % 5 == 0:            testing_dataset[dog_breed].append(breed_image[1])        else:            training_dataset[dog_breed].append(breed_image[1])    # 测试每种类型下的测试集是否至少包含了18%的数据    breed_training_count = len(training_dataset[dog_breed])    breed_testing_count = len(testing_dataset[dog_breed])    assert round(breed_testing_count /                 (breed_training_count + breed_testing_count),                 2) > 0.18, "Not enough testing images."def write_records_file(dataset, record_location):    """    Fill a TFRecords file with the images found in `dataset` and include their category.    Parameters    ----------    dataset : dict(list)      Dictionary with each key being a label for the list of image filenames of its value.    record_location : str      Location to store the TFRecord output.    """    writer = None    # Enumerating the dataset because the current index is used to breakup the files if they get over 100    # images to avoid a slowdown in writing.    current_index = 0    #遍历每一种类型的所有文件    for breed, images_filenames in dataset.items():        #遍历每一个文件        for image_filename in images_filenames:            if current_index % 100 == 0:                if writer:                    writer.close()                #创建tensorflow record的文件名                record_filename = "{record_location}-{current_index}.tfrecords".format(                    record_location=record_location,                    current_index=current_index)                writer = tf.python_io.TFRecordWriter(record_filename)            current_index += 1            image_file = tf.read_file(image_filename)            #将图片按照jpeg格式解析，ImageNet dogs中有些图片按照JPEG解析时会出错，用try            #语句忽视解析错误的图片。            try:                image = tf.image.decode_jpeg(image_file)            except:                print(image_filename)                continue            # 转换为灰度图像.            grayscale_image = tf.image.rgb_to_grayscale(image)            #此处做了修改，resize_images的第二个参数要求是tensor，原代码有误。            #resized_image = tf.image.resize_images(grayscale_image, 250, 151)            resized_image = tf.image.resize_images(grayscale_image, [250, 151])            # tf.cast is used here because the resized images are floats but haven't been converted into            # image floats where an RGB value is between [0,1).            image_bytes = sess.run(tf.cast(resized_image, tf.uint8)).tobytes()            # Instead of using the label as a string, it'd be more efficient to turn it into either an            # integer index or a one-hot encoded rank one tensor.            # https://en.wikipedia.org/wiki/One-hot            #将表示种类的字符串转换为python默认的utf-8格式，防止有问题            image_label = breed.encode("utf-8")            ## 创建一个 example protocol buffer 。            # 其中，feature={            # 'label':            # tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_label])),            # 'image':            # tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_bytes]))            # })是创建1个属性            example = tf.train.Example(                features=tf.train.Features(feature={                    'label':                    tf.train.Feature(bytes_list=tf.train.BytesList(                        value=[image_label])),                    'image':                    tf.train.Feature(bytes_list=tf.train.BytesList(                        value=[image_bytes]))                }))            #SerializeToString()将文件序列化为二进制字符串            writer.write(example.SerializeToString())    writer.close()#分别将测试数据和训练数据写入tensorflow record，分别保存在文件夹./output/testing-images/和./output/#training-images/下面。write_records_file(testing_dataset, "./output/testing-images/testing-image")write_records_file(training_dataset, "./output/training-images/training-image")

几个注意事项：

（1）在保存成tfrecord文件之前，要先在代码的路径下新建./output/testing-images/和./output/training-images/这两个目录，不然会报错。
（2）在cpu上程序完全执行下来耗时较长，我的DELL xps执行了将近18个小时。执行时有报错，报错信息为：

InvalidArgumentError: Invalid JPEG data, size 107746     [[Node: DecodeJpeg_15166 = DecodeJpeg[acceptable_fraction=1, channels=0, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](ReadFile_15166)]]

stackflow上有人问到这个问题，说原因是执行时在pc上拔插了U盘，我也确实拔插了U盘，当时没注意执行情况，然后去吃饭了，吃饭回来发现报错停止了。这时，可以重新开始保存，也可以先查看保存到多少个文件了，将代码中的write_records_file函数简单修改后，继续保存。我修改的代码如下：

def write_records_file(dataset, record_location):    """    Fill a TFRecords file with the images found in `dataset` and include their category.    Parameters    ----------    dataset : dict(list)      Dictionary with each key being a label for the list of image filenames of its value.    record_location : str      Location to store the TFRecord output.    """    writer = None    # Enumerating the dataset because the current index is used to breakup the files if they get over 100    # images to avoid a slowdown in writing.    current_index = 0    for breed, images_filenames in dataset.items():        for image_filename in images_filenames:            #            current_index += 1            #            cc = current_index-1            #我的执行结果中，tfrecord文件的序号到了11000，前面的就不管了，后面的继续执行保存文件操作。            if current_index >= 11000:                if current_index % 100 == 0:                    if writer:                        writer.close()                    record_filename = "{record_location}-{current_index}.tfrecords".format(                        record_location=record_location,                        current_index=current_index)                    #                print(record_filename)                    writer = tf.python_io.TFRecordWriter(record_filename)    #            current_index += 1                image_file = tf.read_file(image_filename)                # In ImageNet dogs, there are a few images which TensorFlow doesn't recognize as JPEGs. This                # try/catch will ignore those images.                try:                    image = tf.image.decode_jpeg(image_file)                except:                    print(image_filename)                    continue                # Converting to grayscale saves processing and memory but isn't required.                grayscale_image = tf.image.rgb_to_grayscale(image)                #            resized_image = tf.image.resize_images(grayscale_image, 250, 151)                resized_image = tf.image.resize_images(grayscale_image,                                                       [250, 151])                # tf.cast is used here because the resized images are floats but haven't been converted into                # image floats where an RGB value is between [0,1).                image_bytes = sess.run(tf.cast(resized_image,                                               tf.uint8)).tobytes()                # Instead of using the label as a string, it'd be more efficient to turn it into either an                # integer index or a one-hot encoded rank one tensor.                # https://en.wikipedia.org/wiki/One-hot                image_label = breed.encode(                    "utf-8")  #这个是担心有中文字符吗？——肖蒙2017年11月1日16:24:17                example = tf.train.Example(                    features=tf.train.Features(feature={                        'label':                        tf.train.Feature(bytes_list=tf.train.BytesList(                            value=[image_label])),                        'image':                        tf.train.Feature(bytes_list=tf.train.BytesList(                            value=[image_bytes]))                    }))                writer.write(example.SerializeToString())            current_index += 1    writer.close()

2.训练与测试
代码如下：

# -*- coding: utf-8 -*-"""Created on Thu Nov  2 09:29:57 2017@author: XM读取第1步保存的tensorflow record文件，并进行训练程序中提到的书本均指《面向机器智能的tensorflow实践》"""import tensorflow as tffrom tensorflow.python.ops import random_opsimport globBATCH_SIZE = 10IMAGE_WIDTH = 250IMAGE_HEIGHT = 151#———————————————————————————————————————图像预处理————————————————————————————————————————————#从文件队列中读取batch_size个文件，用于训练或测试def read_tfrecord(serialized, batch_size):    #parse_single_example解析器将中的example协议内存块解析为张量，    #每个tfrecord中有多幅图片，但parse_single_example只提取单个样本，    #parse_single_example只是解析tfrecord，并不对图像进行解码    features = tf.parse_single_example(        serialized,        features={            'label': tf.FixedLenFeature([], tf.string),            'image': tf.FixedLenFeature([], tf.string),        })    #将图像文件解码为uint8，因为所有通道的信息都处于0~255，然后reshape    record_image = tf.decode_raw(features['image'], tf.uint8)    image = tf.reshape(record_image, [IMAGE_WIDTH, IMAGE_HEIGHT, 1])    #将label平化为字符串    label = tf.cast(features['label'], tf.string)    #用于生成batch的缓冲队列的大小，下面采用的是经验公式    min_after_dequeue = 1000    capacity = min_after_dequeue + 3 * batch_size    #生成image_batch和label_batch    image_batch, label_batch = tf.train.shuffle_batch(        [image, label],        batch_size=batch_size,        capacity=capacity,        min_after_dequeue=min_after_dequeue)    return image_batch, label_batch# Converting the images to a float of [0,1) to match the expected input to convolution2ddef convert_image(image_batch):    return (tf.image.convert_image_dtype(image_batch, tf.float32))# Match every label from label_batch and return the index where they exist in the list of classesdef find_index_label(label_batch):    return (tf.map_fn(        lambda l: tf.where(tf.equal(labels_all, l))[0, 0:1][0],        label_batch,        dtype=tf.int64))#————————————————————————————————————————创建CNN————————————————————————————————————————————————#占位符，None代表输入的数据个数不确定image_holder = tf.placeholder(tf.float32,                              [BATCH_SIZE, IMAGE_WIDTH, IMAGE_HEIGHT, 1])label_holder = tf.placeholder(tf.int64, [BATCH_SIZE])keep_prob_holder = tf.placeholder(tf.float32)  #dropout保留的比例#此部分代码是创建卷积层时weights_initializer用到的初始化函数，#书中代码没有此部分，是新添加的def weights_initializer_random_normal(shape,                                      dtype=tf.float32,                                      partition_info=None):    return random_ops.random_normal(shape)#h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)#第1层卷积————————————————————————with tf.name_scope("conv1") as scope:    #这里用的是高级层，而不是标准层tf.nn.conv2d，二者的区别见书本第5.3.5节    conv2d_layer_one = tf.contrib.layers.convolution2d(        image_holder,        #产生滤波器的数量，书中代码有误        num_outputs=32,        #num_output_channels=32,        #核尺寸        kernel_size=(5, 5),        #激活函数        activation_fn=tf.nn.relu,        #权值初始化，书中代码有误：        #1、weight_init应该是weights_initializer；        #2、写成tf.random_normal会报错：random_normal() got an unexpected keyword argument 'partition_info'，        weights_initializer=weights_initializer_random_normal,        #    weight_init=tf.random_normal,        stride=(2, 2),        trainable=True)#第1层池化————————————————————————————————with tf.name_scope("pool1") as scope:    pool_layer_one = tf.nn.max_pool(        conv2d_layer_one,        ksize=[1, 2, 2, 1],        strides=[1, 2, 2, 1],        padding='SAME')#第2层卷积————————————————————————————————with tf.name_scope("conv2") as scope:    conv2d_layer_two = tf.contrib.layers.convolution2d(        pool_layer_one,        #修改，原因同第1层        num_outputs=64,        #num_output_channels=64,        kernel_size=(5, 5),        activation_fn=tf.nn.relu,        #修改，原因同第1层        weights_initializer=weights_initializer_random_normal,        #weight_init=tf.random_normal,        stride=(1, 1),        trainable=True)#第2层池化————————————————————————————————with tf.name_scope("pool2") as scope:    pool_layer_two = tf.nn.max_pool(        conv2d_layer_two,        ksize=[1, 2, 2, 1],        strides=[1, 2, 2, 1],        padding='SAME')#展开层，展开为秩1张量——————————————————————with tf.name_scope("flat") as scope:    flattened_layer_two = tf.reshape(pool_layer_two, [BATCH_SIZE, -1])#全连接层1—————————————————————————————————with tf.name_scope("full_connect1") as scope:    hidden_layer_three = tf.contrib.layers.fully_connected(            flattened_layer_two,            512,            #修改，原因同第1层            weights_initializer=lambda i, dtype, partition_info=None: tf.truncated_normal([38912, 512], stddev=0.1),            #weight_init=lambda i, dtype: tf.truncated_normal([38912, 512], stddev=0.1),            activation_fn=tf.nn.relu)    #小trick：dropout    hidden_layer_three = tf.nn.dropout(hidden_layer_three, keep_prob_holder)#全连接层2—————————————————————————————————with tf.name_scope("full_connect2") as scope:    final_fully_connected = tf.contrib.layers.fully_connected(            hidden_layer_three,            120,            #修改，原因同第1层            weights_initializer=lambda i, dtype, partition_info=None: tf.truncated_normal([512, 120], stddev=0.1)            #weight_init=lambda i, dtype: tf.truncated_normal([512, 120], stddev=0.1)            )#输出———————————————————————with tf.name_scope("output") as scope:    logits = final_fully_connected    #查找排名第1的分类结果是否是实际的种类    top_k_op = tf.nn.in_top_k(logits, label_holder, 1)#————————————————————————————————————————loss————————————————————————————————————————————————#计算交叉熵def loss(logits, labels):    #按照tensorflow1.0以上版本修改    #logits是全连接层的输出，不需softmax归一化，因为sparse_softmax_cross_entropy_with_logits函数会先将logits进行softmax归一化，然后与label表示的onehot向量比较，计算交叉熵。    return tf.reduce_mean(        tf.nn.sparse_softmax_cross_entropy_with_logits(            logits=logits, labels=labels))#————————————————————————————————————————training———————————————————————————————————————————————#模型训练def training(loss_value, learning_rate, batch):    return tf.train.AdamOptimizer(learning_rate, 0.9).minimize(        loss_value, global_step=batch)#————————————————————————————————————————主函数——————————————————————————————————————————————————if __name__ == '__main__':    #下面的几句是我添加的，因为我这里读到的路径形式为：'./imagenet-dogs\\n02085620-Chihuahua\\'，路径分隔符中除第1个之外，都是2个反斜杠，与例程不一致。这里将2个反斜杠替换为斜杠。    #glob.glob 用于获取所有匹配的路径    glob_path = glob.glob("./imagenet-dogs/*")    glob_path2 = list(map(lambda image: image.replace('\\', '/'), glob_path))    #读取所有的label，形式为n02085620-Chihuahua....    labels_all = list(map(lambda c: c.split("/")[-1], glob_path2))    #将所有的文件名列表（由函数tf.train.match_filenames_once匹配产生）    #生成一个队列，供后面的文件阅读器reader读取    #训练数据队列    filename_queue_train = tf.train.string_input_producer(        tf.train.match_filenames_once("./output/training-images/*.tfrecords"))    #测试数据队列    filename_queue_test = tf.train.string_input_producer(        tf.train.match_filenames_once("./output/testing-images/*.tfrecords"))    #创建tfrecord阅读器，并读取数据。    #默认shuffle=True，将文件打乱    reader = tf.TFRecordReader()    _, serialized_train = reader.read(filename_queue_train)    _, serialized_test = reader.read(filename_queue_test)    #读取训练数据——————————————————————————————————    train_image_batch, train_label_batch = read_tfrecord(        serialized_train, BATCH_SIZE)    # Converting the images to a float of [0,1) to match the expected input to convolution2d    train_images_op = convert_image(train_image_batch)    # Match every label from label_batch and return the index where they exist in the list of classes    train_labels_op = find_index_label(train_label_batch)    #读取测试数据——————————————————————————————————    test_image_batch, test_label_batch = read_tfrecord(serialized_test,                                                       BATCH_SIZE)    # Converting the images to a float of [0,1) to match the expected input to convolution2d    test_images_op = convert_image(test_image_batch)    # Match every label from label_batch and return the index where they exist in the list of classes    test_labels_op = find_index_label(test_label_batch)    #————————————————————————————————————————————    batch = tf.Variable(0)    learning_rate = tf.train.exponential_decay(        0.01, batch * 3, 120, 0.95, staircase=True)    loss_op = loss(logits, train_labels_op)    train_op = training(loss_op, learning_rate, batch)    sess = tf.InteractiveSession()    #必须同时有全局变量和局部变量的初始化，不然会报错：    #OutOfRangeError (see above for traceback): RandomShuffleQueue '_134_shuffle_batch_8/random_shuffle_queue' is closed and has insufficient elements (requested 3, current size 0)    sess.run(tf.local_variables_initializer())    sess.run(tf.global_variables_initializer())    #声明一个Coordinator类来协同多个线程    coord = tf.train.Coordinator()    # 开始 Queue Runners (队列运行器)    threads = tf.train.start_queue_runners(sess=sess, coord=coord)    #执行训练————————————————————————————————————————————    for j in range(1000):        train_images = sess.run(train_images_op)        train_labels = sess.run(train_labels_op)        train_logits, train_result, _ = sess.run(            [logits, top_k_op, train_op],            feed_dict={                image_holder: train_images,                label_holder: train_labels,                keep_prob_holder: 0.1            })        if j % 10 == 0:            #            print(train_labels)            #            print(train_result)            print("loss = ",                  sess.run(                      loss_op,                      feed_dict={                          image_holder: train_images,                          label_holder: train_labels,                          keep_prob_holder: 0.1                      }), 't=', j)    #测试————————————————————————————————————————————    #每次的准确率    accurary_once_op = tf.reduce_mean(tf.cast(top_k_op, tf.float32))    #测试轮数    test_num = 0    #测试总准确度    accuracy_total = 0    for i in range(100):        test_images = sess.run(test_images_op)        test_labels = sess.run(test_labels_op)        accuracy_once = sess.run(            accurary_once_op,            feed_dict={                image_holder: test_images,                label_holder: test_labels,                keep_prob_holder: 1.0            })        accuracy_total = accuracy_total + accuracy_once        test_num = test_num + 1        if i % 10 == 0:            print("第", i, "轮测试，准确率为：", accuracy_total / test_num)    print("总准确率为：", accuracy_total / test_num)    #        if i%10 == 0:    #            print("次数：",i,"————————————————————————————————")    #            print(test_labels)    #            print(test_result)    #结束————————————————————————————————————————————    #通知其他线程退出    coord.request_stop()    #等待所有线程退出    coord.join(threads)    sess.close()

其中的训练轮数和测试次数均可以修改。

阅读全文

0 0