TFRecord格式数据和类似cifar的bin格式文件
来源:互联网 发布:知恩感恩手抄报 编辑:程序博客网 时间:2024/03/29 09:27
主要参考链接:
1、TFrecord: https://github.com/kevin28520/My-TensorFlow-tutorials
2、Bin文件制作:http://blog.csdn.net/YhL_Leo/article/details/50801226(c++语言的)
(代码有部分需要简单修改,我是一个c++小白,在这要感谢同时郭**的帮忙,另外python的制作方式我也尝试了,能够制作bin文件,但是在使用https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10中方法读取时,对应的label总有问题,尝试了修改,但目前还不行。c++的简单修改代码后可以实现。)
之前制作了tfrecord格式的图片数据,一直没有用tensorflow去验证,这周正好有时间,就写了个简单的CNN,对生成的tfrecord格式数据进行验证,同时也将图片数据制作成类似cifar10格式的数据(二进制bin文件)。
一、TFreocrd格式数据
之前写过一次TFrecod数据制作,但当时在忙着做其他的,所以断断续续的一种没有感觉。这周好好花时间对其进行了实现。实现都是在别人的代码上进行简单修改。
代码:
# -*- coding: utf-8 -*-"""Created on Sat May 20 10:49:19 2017@author: xx"""import tensorflow as tfimport numpy as npimport osimport matplotlib.pyplot as plt#import skimage.io as ioimport cv2def get_file(file_dir): images = [] temp = [] for root, sub_folders, files in os.walk(file_dir): # image directories for name in files: images.append(os.path.join(root, name)) # get 10 sub-folder names for name in sub_folders: temp.append(os.path.join(root, name)) # assign 10 labels based on the folder names labels = [] for one_folder in temp: n_img = len(os.listdir(one_folder)) letter = one_folder.split('\\')[-1] if letter=='0': labels = np.append(labels, n_img*[1]) elif letter=='1': labels = np.append(labels, n_img*[2]) elif letter=='2': labels = np.append(labels, n_img*[3]) elif letter=='3': labels = np.append(labels, n_img*[4]) # shuffle temp = np.array([images, labels]) temp = temp.transpose() np.random.shuffle(temp) image_list = list(temp[:, 0]) label_list = list(temp[:, 1]) label_list = [int(float(i)) for i in label_list] return image_list, label_listdef int64_feature(value): """Wrapper for inserting int64 features into Example proto.""" if not isinstance(value, list): value = [value] return tf.train.Feature(int64_list=tf.train.Int64List(value=value))def bytes_feature(value): return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))def convert_to_tfrecord(images, labels, save_dir, name): filename = os.path.join(save_dir, name + '.tfrecords') n_samples = len(labels) if np.shape(images)[0] != n_samples: raise ValueError('Images size %d does not match label size %d.' %(images.shape[0], n_samples)) # wait some time here, transforming need some time based on the size of your data. writer = tf.python_io.TFRecordWriter(filename) print('\nTransform start......') for i in np.arange(0, n_samples): try: image = cv2.imread(images[i]) #print(i)# type(image) must be array! #print(image.shape) image = np.array(cv2.resize(image, (100, 100))) image_raw = image.tostring() label = int(labels[i]) example = tf.train.Example(features=tf.train.Features(feature={ 'label':int64_feature(label), 'image_raw': bytes_feature(image_raw)})) writer.write(example.SerializeToString()) except IOError as e: print('Could not read:', images[i]) print('error: %s' %e) print('Skip it!\n') writer.close() print('Transform done!')def read_and_decode(tfrecords_file, batch_size): '''read and decode tfrecord file, generate (image, label) batches Args: tfrecords_file: the directory of tfrecord file batch_size: number of images in each batch Returns: image: 4D tensor - [batch_size, width, height, channel] label: 1D tensor - [batch_size] ''' # make an input queue from the tfrecord file filename_queue = tf.train.string_input_producer([tfrecords_file]) reader = tf.TFRecordReader() _, serialized_example = reader.read(filename_queue) img_features = tf.parse_single_example( serialized_example, features={ 'label': tf.FixedLenFeature([], tf.int64), 'image_raw': tf.FixedLenFeature([], tf.string), }) image = tf.decode_raw(img_features['image_raw'], tf.uint8) ########################################################## # you can put data augmentation here, I didn't use it ########################################################## # all the images of notMNIST are 28*28, you need to change the image size if you use other dataset. image = tf.reshape(image, [100, 100, 3]) #print(image) label = tf.cast(img_features['label'], tf.int32) image_batch, label_batch = tf.train.batch([image, label], batch_size= batch_size, num_threads= 64, capacity = 2000) return image_batch, tf.reshape(label_batch, [batch_size]) test_dir = r'F:\tensorflow\example1_tfreocds\handclassifier\hand_data'save_dir = r'F:\tensorflow\example1_tfreocds\handclassifier''''test_dir = r'F:\tensorflow\example1_tfreocds\handclassifier\handclassifier\My-TensorFlow-tutorials-master\03 TFRecord\Data\notMNIST_small'save_dir = r'F:\tensorflow\example1_tfreocds\handclassifier\handclassifier\My-TensorFlow-tutorials-master\03 TFRecord\Data''''BATCH_SIZE = 25#Convert test data: you just need to run it ONCE !name_test = 'hand_TF'images, labels = get_file(test_dir)convert_to_tfrecord(images, labels, save_dir, name_test)def plot_images(images, labels): '''plot one batch size ''' for i in np.arange(0, BATCH_SIZE): plt.subplot(5, 5, i + 1) plt.axis('off') plt.title(chr(ord('A') + labels[i] - 1), fontsize = 14) plt.subplots_adjust(top=1.5) plt.imshow(images[i]) plt.show()tfrecords_file = r"F:\tensorflow\example1_tfreocds\handclassifier\hand_tf.tfrecords"#tfrecords_file = r'F:\tensorflow\example1_tfreocds\handclassifier\handclassifier\My-TensorFlow-tutorials-master\03 TFRecord\Data\test.tfrecords'image_batch, label_batch = read_and_decode(tfrecords_file, batch_size=BATCH_SIZE)with tf.Session() as sess: i = 0 coord = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess=sess, coord=coord) try: while not coord.should_stop(): # just plot one batch size image, label = sess.run([image_batch, label_batch]) plot_images(image, label) i+=1 except tf.errors.OutOfRangeError: print('done!') finally: coord.request_stop() coord.join(threads)
卷积神经网络实现:
import tensorflow as tfimport timebatchsize = 64tfrecords_file = r'F:\tensorflow\example1_tfreocds\handclassifier\handclassifier\My-TensorFlow-tutorials-master\03 TFRecord\Data\test.tfrecords'def read_and_decode(tfrecords_file, batch_size): '''read and decode tfrecord file, generate (image, label) batches Args: tfrecords_file: the directory of tfrecord file batch_size: number of images in each batch Returns: image: 4D tensor - [batch_size, width, height, channel] label: 1D tensor - [batch_size] ''' # make an input queue from the tfrecord file filename_queue = tf.train.string_input_producer([tfrecords_file]) reader = tf.TFRecordReader() _, serialized_example = reader.read(filename_queue) img_features = tf.parse_single_example( serialized_example, features={ 'label': tf.FixedLenFeature([], tf.int64), 'image_raw': tf.FixedLenFeature([], tf.string), }) image = tf.decode_raw(img_features['image_raw'], tf.uint8) ########################################################## # you can put data augmentation here, I didn't use it ########################################################## # all the images of notMNIST are 28*28, you need to change the image size if you use other dataset. image = tf.reshape(image, [28, 28 ,3]) image = tf.cast(image, tf.float32) * (1. / 255) #print(image) label = tf.cast(img_features['label'], tf.int32) image_batch, label_batch = tf.train.batch([image, label], batch_size= batch_size, num_threads= 64, capacity = 2000) #tf.reshape(label_batch, [batch_size]) #label_batch = tf.one_hot(label_batch, depth=10) return image_batch, label_batch image_batch1, label_batch1 = read_and_decode(tfrecords_file, batch_size=batchsize)#####中间卷积神经网络的实现代码就不放了啊###################sess = tf.InteractiveSession()saver = tf.train.Saver()sess.run(tf.global_variables_initializer())#sess.run(tf.local_variables_initializer())tf.train.start_queue_runners()coord = tf.train.Coordinator()threads = tf.train.start_queue_runners(sess=sess, coord=coord)for step in range(1000): #start_time = time.time() image_batch, label_batch = sess.run([image_batch1,label_batch1]) tf.train.write_graph(sess.graph_def, '.', 'tfdroid.pbtxt') _, loss_value = sess.run([train_op, loss], feed_dict={image_holder:image_batch, label_holder:label_batch-1}) #duration = float(time.time - start_time) if step % 1000 == 0: print(step, loss_value)
二、bin文件制作
学习过程中经常用到cifar这个数据集,在调用时发现整的很简单,在学习TFrRecord格式查找资料时,发现了bin格式文件制作,就学习了一下。
代码:
这个c++代码比较多,感兴趣的可以去看上面的链接啊,c++的代码只能一次生成一个bin文件这样感觉会比较麻烦,本文比较菜,就暴力的将几个bin文件读成了一个:代码如下:# -*- coding: utf-8 -*-"""Created on Thu May 25 11:01:54 2017@author: xx"""import os import numpy as np import tensorflow as tfimport cifar10_inputf_out = open(r"F:\tensorflow\example1_tfreocds\handclassifier\all_train_data.bin","wb") pad_data = [0xFF for i in range(7000)] for fileCnt in range(4): fileCnts = fileCnt file_name = r"F:\tensorflow\example1_tfreocds\handclassifier\bin_out_" + str(fileCnts)+".bin" if os.path.exists(file_name): #print(fileCnts,"exists") f_out.write(open(file_name, "rb").read()) else: f_out.write(bytearray(pad_data)) f_out.close() print("OK")
卷积神经网络实现:
import cifar10_inputimport tensorflow as tfimport timedata_dir = r"F:\tensorflow\example1_tfreocds\handclassifier"batchsize = 64max_steps = 100000image_train, labels_train = cifar10_input.distorted_inputs(data_dir=data_dir, batch_size=batchsize)######卷积神经网络代码忽略#####sess = tf.InteractiveSession()tf.global_variables_initializer().run()tf.train.start_queue_runners()for step in range(max_steps): #start_time = time.time() image_batch, label_batch = sess.run([image_train, labels_train]) _, loss_value = sess.run([train_op, loss], feed_dict={image_holder:image_batch, label_holder:label_batch}) # duration = float(time.time - start_time) if step % 10 == 0: print(step, loss_value)# examples_per_sec = batchsize / duration# sec_per_batch = float(duration)# format_str = ('step %d, loss=%0.2f (%0.1f example/sec; %0.3f sec/batch)')# print(format_str %(step, loss_value, examples_per_sec, sec_per_batch))(加上被注释掉的代码就会报错,google了一天都没有找到问题,如果哪位大神遇到过并解决了,麻烦告知一下)
阅读全文
0 0
- TFRecord格式数据和类似cifar的bin格式文件
- 制作自己的数据集tfrecord格式
- tensorflow中的TFRecord格式文件的写入和读取
- 将图片保存为cifar-10类似的格式
- tensorflow读取数据-tfrecord格式
- tensorflow读取数据-tfrecord格式
- [TFRecord格式数据]基本介绍
- TFRecord —— tensorflow 下的统一数据存储格式
- 将自己的数据集制作成TFRecord格式
- ELF格式文件和BIN文件的区别
- ELF格式文件和BIN文件的区别
- ELF格式文件和BIN文件的区别
- Windows Caffe 学习笔记 CIFAR-10数据的格式转换
- AI_Scene classification 数据制作成TFrecord 格式
- [TFRecord格式数据]利用TFRecords存储与读取带标签的图片
- 仿照CIFAR-10数据集格式,制作自己的数据集
- CIFAR-10和CIFAR-100数据集读取显示
- TensorFlow 制作自己的TFRecord数据集
- 蓝桥杯--算法练习:最短路(单源最短路径spfa算法)
- base64+jfinal+canvas实现在线电子签名功能
- ubuntu配置交叉编译链
- font-family
- centos7配置本地yum源
- TFRecord格式数据和类似cifar的bin格式文件
- 极其方便的系统原生图片选取库,已适配Android 7.0
- Ubuntu14.04安装Ryu控制器
- 关于Data truncation: Incorrect string value: '' for column 问题的解决办法
- CSS并不简单--z-index引发的思考
- apt-注解开发-2-定义操作api
- Github 开源:升讯威 Winform 开源控件库( Sheng.Winform.Controls)
- Java与MongoDB的配合使用备忘2-CURD操作具体实例
- Java Web篇:导出等比例图片到Excel