TFRecord格式数据和类似cifar的bin格式文件

来源:互联网 发布:知恩感恩手抄报 编辑:程序博客网 时间:2024/03/29 09:27

主要参考链接:
1、TFrecord: https://github.com/kevin28520/My-TensorFlow-tutorials
2、Bin文件制作:http://blog.csdn.net/YhL_Leo/article/details/50801226(c++语言的)
(代码有部分需要简单修改,我是一个c++小白,在这要感谢同时郭**的帮忙,另外python的制作方式我也尝试了,能够制作bin文件,但是在使用https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10中方法读取时,对应的label总有问题,尝试了修改,但目前还不行。c++的简单修改代码后可以实现。)
之前制作了tfrecord格式的图片数据,一直没有用tensorflow去验证,这周正好有时间,就写了个简单的CNN,对生成的tfrecord格式数据进行验证,同时也将图片数据制作成类似cifar10格式的数据(二进制bin文件)。

一、TFreocrd格式数据
之前写过一次TFrecod数据制作,但当时在忙着做其他的,所以断断续续的一种没有感觉。这周好好花时间对其进行了实现。实现都是在别人的代码上进行简单修改。
代码:

# -*- coding: utf-8 -*-"""Created on Sat May 20 10:49:19 2017@author: xx"""import tensorflow as tfimport numpy as npimport osimport matplotlib.pyplot as plt#import skimage.io as ioimport cv2def get_file(file_dir):    images = []    temp = []    for root, sub_folders, files in os.walk(file_dir):        # image directories        for name in files:            images.append(os.path.join(root, name))        # get 10 sub-folder names        for name in sub_folders:            temp.append(os.path.join(root, name))    # assign 10 labels based on the folder names    labels = []            for one_folder in temp:                n_img = len(os.listdir(one_folder))        letter = one_folder.split('\\')[-1]        if letter=='0':            labels = np.append(labels, n_img*[1])        elif letter=='1':            labels = np.append(labels, n_img*[2])        elif letter=='2':            labels = np.append(labels, n_img*[3])        elif letter=='3':            labels = np.append(labels, n_img*[4])    # shuffle    temp = np.array([images, labels])    temp = temp.transpose()    np.random.shuffle(temp)    image_list = list(temp[:, 0])    label_list = list(temp[:, 1])    label_list = [int(float(i)) for i in label_list]    return image_list, label_listdef int64_feature(value):  """Wrapper for inserting int64 features into Example proto."""  if not isinstance(value, list):    value = [value]  return tf.train.Feature(int64_list=tf.train.Int64List(value=value))def bytes_feature(value):  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))def convert_to_tfrecord(images, labels, save_dir, name):        filename = os.path.join(save_dir, name + '.tfrecords')    n_samples = len(labels)    if np.shape(images)[0] != n_samples:        raise ValueError('Images size %d does not match label size %d.' %(images.shape[0], n_samples))    # wait some time here, transforming need some time based on the size of your data.    writer = tf.python_io.TFRecordWriter(filename)    print('\nTransform start......')    for i in np.arange(0, n_samples):        try:            image = cv2.imread(images[i])            #print(i)# type(image) must be array!            #print(image.shape)            image = np.array(cv2.resize(image, (100, 100)))            image_raw = image.tostring()            label = int(labels[i])            example = tf.train.Example(features=tf.train.Features(feature={                            'label':int64_feature(label),                            'image_raw': bytes_feature(image_raw)}))            writer.write(example.SerializeToString())        except IOError as e:            print('Could not read:', images[i])            print('error: %s' %e)            print('Skip it!\n')    writer.close()    print('Transform done!')def read_and_decode(tfrecords_file, batch_size):    '''read and decode tfrecord file, generate (image, label) batches    Args:        tfrecords_file: the directory of tfrecord file        batch_size: number of images in each batch    Returns:        image: 4D tensor - [batch_size, width, height, channel]        label: 1D tensor - [batch_size]    '''    # make an input queue from the tfrecord file    filename_queue = tf.train.string_input_producer([tfrecords_file])    reader = tf.TFRecordReader()    _, serialized_example = reader.read(filename_queue)    img_features = tf.parse_single_example(                                        serialized_example,                                        features={                                               'label': tf.FixedLenFeature([], tf.int64),                                               'image_raw': tf.FixedLenFeature([], tf.string),                                               })    image = tf.decode_raw(img_features['image_raw'], tf.uint8)    ##########################################################    # you can put data augmentation here, I didn't use it    ##########################################################    # all the images of notMNIST are 28*28, you need to change the image size if you use other dataset.    image = tf.reshape(image, [100, 100, 3])    #print(image)    label = tf.cast(img_features['label'], tf.int32)        image_batch, label_batch = tf.train.batch([image, label],                                                batch_size= batch_size,                                                num_threads= 64,                                                 capacity = 2000)    return image_batch, tf.reshape(label_batch, [batch_size])    test_dir = r'F:\tensorflow\example1_tfreocds\handclassifier\hand_data'save_dir = r'F:\tensorflow\example1_tfreocds\handclassifier''''test_dir = r'F:\tensorflow\example1_tfreocds\handclassifier\handclassifier\My-TensorFlow-tutorials-master\03 TFRecord\Data\notMNIST_small'save_dir = r'F:\tensorflow\example1_tfreocds\handclassifier\handclassifier\My-TensorFlow-tutorials-master\03 TFRecord\Data''''BATCH_SIZE = 25#Convert test data: you just need to run it ONCE !name_test = 'hand_TF'images, labels = get_file(test_dir)convert_to_tfrecord(images, labels, save_dir, name_test)def plot_images(images, labels):    '''plot one batch size    '''    for i in np.arange(0, BATCH_SIZE):        plt.subplot(5, 5, i + 1)        plt.axis('off')        plt.title(chr(ord('A') + labels[i] - 1), fontsize = 14)        plt.subplots_adjust(top=1.5)        plt.imshow(images[i])    plt.show()tfrecords_file = r"F:\tensorflow\example1_tfreocds\handclassifier\hand_tf.tfrecords"#tfrecords_file = r'F:\tensorflow\example1_tfreocds\handclassifier\handclassifier\My-TensorFlow-tutorials-master\03 TFRecord\Data\test.tfrecords'image_batch, label_batch = read_and_decode(tfrecords_file, batch_size=BATCH_SIZE)with tf.Session()  as sess:    i = 0    coord = tf.train.Coordinator()    threads = tf.train.start_queue_runners(sess=sess, coord=coord)    try:        while not coord.should_stop():            # just plot one batch size                        image, label = sess.run([image_batch, label_batch])            plot_images(image, label)            i+=1    except tf.errors.OutOfRangeError:        print('done!')    finally:        coord.request_stop()    coord.join(threads)

卷积神经网络实现:

import tensorflow as tfimport timebatchsize = 64tfrecords_file = r'F:\tensorflow\example1_tfreocds\handclassifier\handclassifier\My-TensorFlow-tutorials-master\03 TFRecord\Data\test.tfrecords'def read_and_decode(tfrecords_file, batch_size):    '''read and decode tfrecord file, generate (image, label) batches    Args:        tfrecords_file: the directory of tfrecord file        batch_size: number of images in each batch    Returns:        image: 4D tensor - [batch_size, width, height, channel]        label: 1D tensor - [batch_size]    '''    # make an input queue from the tfrecord file    filename_queue = tf.train.string_input_producer([tfrecords_file])    reader = tf.TFRecordReader()    _, serialized_example = reader.read(filename_queue)    img_features = tf.parse_single_example(                                        serialized_example,                                        features={                                               'label': tf.FixedLenFeature([], tf.int64),                                               'image_raw': tf.FixedLenFeature([], tf.string),                                               })    image = tf.decode_raw(img_features['image_raw'], tf.uint8)    ##########################################################    # you can put data augmentation here, I didn't use it    ##########################################################    # all the images of notMNIST are 28*28, you need to change the image size if you use other dataset.    image = tf.reshape(image, [28, 28 ,3])    image = tf.cast(image, tf.float32) * (1. / 255)    #print(image)    label = tf.cast(img_features['label'], tf.int32)        image_batch, label_batch = tf.train.batch([image, label],                                                batch_size= batch_size,                                                num_threads= 64,                                                 capacity = 2000)    #tf.reshape(label_batch, [batch_size])    #label_batch = tf.one_hot(label_batch, depth=10)    return image_batch, label_batch   image_batch1, label_batch1 = read_and_decode(tfrecords_file, batch_size=batchsize)#####中间卷积神经网络的实现代码就不放了啊###################sess = tf.InteractiveSession()saver = tf.train.Saver()sess.run(tf.global_variables_initializer())#sess.run(tf.local_variables_initializer())tf.train.start_queue_runners()coord = tf.train.Coordinator()threads = tf.train.start_queue_runners(sess=sess, coord=coord)for step in range(1000):    #start_time = time.time()    image_batch, label_batch = sess.run([image_batch1,label_batch1])    tf.train.write_graph(sess.graph_def, '.', 'tfdroid.pbtxt')     _, loss_value = sess.run([train_op, loss], feed_dict={image_holder:image_batch, label_holder:label_batch-1})    #duration = float(time.time - start_time)    if step % 1000 == 0:        print(step, loss_value)

二、bin文件制作
学习过程中经常用到cifar这个数据集,在调用时发现整的很简单,在学习TFrRecord格式查找资料时,发现了bin格式文件制作,就学习了一下。
代码:

这个c++代码比较多,感兴趣的可以去看上面的链接啊,c++的代码只能一次生成一个bin文件这样感觉会比较麻烦,本文比较菜,就暴力的将几个bin文件读成了一个:代码如下:# -*- coding: utf-8 -*-"""Created on Thu May 25 11:01:54 2017@author: xx"""import os  import numpy as np  import tensorflow as tfimport cifar10_inputf_out = open(r"F:\tensorflow\example1_tfreocds\handclassifier\all_train_data.bin","wb")  pad_data = [0xFF for i in range(7000)]  for fileCnt in range(4):      fileCnts = fileCnt      file_name = r"F:\tensorflow\example1_tfreocds\handclassifier\bin_out_" + str(fileCnts)+".bin"      if os.path.exists(file_name):          #print(fileCnts,"exists")          f_out.write(open(file_name, "rb").read())      else:          f_out.write(bytearray(pad_data))  f_out.close()         print("OK")        

卷积神经网络实现:

import cifar10_inputimport tensorflow as tfimport timedata_dir = r"F:\tensorflow\example1_tfreocds\handclassifier"batchsize = 64max_steps = 100000image_train, labels_train = cifar10_input.distorted_inputs(data_dir=data_dir, batch_size=batchsize)######卷积神经网络代码忽略#####sess = tf.InteractiveSession()tf.global_variables_initializer().run()tf.train.start_queue_runners()for step in range(max_steps):    #start_time = time.time()    image_batch, label_batch = sess.run([image_train, labels_train])      _, loss_value = sess.run([train_op, loss],                              feed_dict={image_holder:image_batch, label_holder:label_batch})   # duration = float(time.time - start_time)    if step % 10 == 0:        print(step, loss_value)#        examples_per_sec = batchsize / duration#        sec_per_batch = float(duration)#        format_str = ('step %d, loss=%0.2f (%0.1f example/sec; %0.3f sec/batch)')#        print(format_str %(step, loss_value, examples_per_sec, sec_per_batch))(加上被注释掉的代码就会报错,google了一天都没有找到问题,如果哪位大神遇到过并解决了,麻烦告知一下)
阅读全文
0 0
原创粉丝点击