Keras学习---数据预处理篇
来源:互联网 发布:怎么把淘宝下载到桌面 编辑:程序博客网 时间:2024/05/17 00:07
1. 数据预处理是必要的,这里以最简单的MNIST dataset的输入数据预处理为例。
A. 设置随机种子
np.random.seed(1337) # for reproducibility
np.random.seed(1337) # for reproducibility
B. 输入数据维度规格化,这里每个样本只是size为784的一维数组。
X_train = X_train.reshape(60000, 784)
X_train = X_train.reshape(60000, 784)
将类别标签转换为one-hot encoding, 这一步对多分类是必须的
one_hot_labels = keras.utils.np_utils.to_categorical(labels, num_classes=10)
one_hot_labels = keras.utils.np_utils.to_categorical(labels, num_classes=10)
train sets 和test sets可能需要shuffle处理?
C. 输入数据类型转换,数值归一化
X_train = X_train.astype('float32')
X_train /= 255
X_train /= 255
MNIST dataset的MLP完整代码如下:
'''Trains a simple deep NN on the MNIST dataset.Gets to 98.40% test accuracy after 20 epochs(there is *a lot* of margin for parameter tuning).2 seconds per epoch on a K520 GPU.'''from __future__ import print_functionimport numpy as npnp.random.seed(1337) # for reproducibilityfrom keras.datasets import mnistfrom keras.models import Sequentialfrom keras.layers.core import Dense, Dropout, Activationfrom keras.optimizers import SGD, Adam, RMSpropfrom keras.utils import np_utilsbatch_size = 128nb_classes = 10nb_epoch = 20# the data, shuffled and split between train and test sets(X_train, y_train), (X_test, y_test) = mnist.load_data()X_train = X_train.reshape(60000, 784)X_test = X_test.reshape(10000, 784)X_train = X_train.astype('float32')X_test = X_test.astype('float32')X_train /= 255X_test /= 255print(X_train.shape[0], 'train samples')print(X_test.shape[0], 'test samples')# convert class vectors to binary class matricesY_train = np_utils.to_categorical(y_train, nb_classes)Y_test = np_utils.to_categorical(y_test, nb_classes)model = Sequential()model.add(Dense(512, input_shape=(784,)))model.add(Activation('relu'))model.add(Dropout(0.2))model.add(Dense(512))model.add(Activation('relu'))model.add(Dropout(0.2))model.add(Dense(10))model.add(Activation('softmax'))model.summary()#model.compile(loss='categorical_crossentropy',# optimizer=RMSprop(),# metrics=['accuracy'])model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.02), metrics=['accuracy'])history = model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, Y_test))score = model.evaluate(X_test, Y_test, verbose=0)print('Test score:', score[0])print('Test accuracy:', score[1])
2. 如果输入数据是图像,并且使用的是CNN模型,输入数据的维度处理会稍微复杂些。
先了解下Keras 1.x中的image_dim_ordering参数。
“channels_last”对应原本的“tf”,“channels_first”对应原本的“th”。
以128x128的RGB图像为例,“channels_first”应将数据组织为(3,128,128),而“channels_last”应将数据组织为(128,128,3)。
MNIST dataset的CNN模型完整代码如下,特别需要注意input_shape和X_train/X_test。
'''Trains a simple convnet on the MNIST dataset.Gets to 99.25% test accuracy after 12 epochs(there is still a lot of margin for parameter tuning).16 seconds per epoch on a GRID K520 GPU.'''from __future__ import print_functionimport numpy as npnp.random.seed(1337) # for reproducibilityfrom keras.datasets import mnistfrom keras.models import Sequentialfrom keras.layers import Dense, Dropout, Activation, Flattenfrom keras.layers import Convolution2D, MaxPooling2Dfrom keras.utils import np_utilsfrom keras import backend as Kbatch_size = 128nb_classes = 10nb_epoch = 12# input image dimensionsimg_rows, img_cols = 28, 28# number of convolutional filters to usenb_filters = 32# size of pooling area for max poolingpool_size = (2, 2)# convolution kernel sizekernel_size = (3, 3)# the data, shuffled and split between train and test sets(X_train, y_train), (X_test, y_test) = mnist.load_data()#import gzip#from six.moves import cPickle#path=r'C:\Users\ll\.keras\datasets\mnist.pkl.gz'#f = gzip.open(path, 'rb')#(X_train, y_train), (x_valid,y_valid),(X_test, y_test) = cPickle.load(f, encoding='bytes')#f.close()if K.image_dim_ordering() == 'th': X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols) X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols) input_shape = (1, img_rows, img_cols)else: X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1) X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1) input_shape = (img_rows, img_cols, 1)X_train = X_train.astype('float32')X_test = X_test.astype('float32')X_train /= 255X_test /= 255print('X_train shape:', X_train.shape)print(X_train.shape[0], 'train samples')print(X_test.shape[0], 'test samples')# convert class vectors to binary class matricesY_train = np_utils.to_categorical(y_train, nb_classes)Y_test = np_utils.to_categorical(y_test, nb_classes)model = Sequential()model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='valid', input_shape=input_shape))model.add(Activation('relu'))model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=pool_size))model.add(Dropout(0.25))model.add(Flatten())model.add(Dense(128))model.add(Activation('relu'))model.add(Dropout(0.5))model.add(Dense(nb_classes))model.add(Activation('softmax'))model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, Y_test))score = model.evaluate(X_test, Y_test, verbose=0)print('Test score:', score[0])print('Test accuracy:', score[1])
阅读全文
0 1
- Keras学习---数据预处理篇
- Keras学习之一:文本与序列预处理
- keras中文文档笔记10——数据预处理
- 深度学习-----数据预处理
- 机器学习数据预处理
- 【机器学习】数据预处理
- Keras 文本预处理
- Weka学习2-数据预处理
- 机器学习实战--数据预处理
- 深度学习中的数据预处理
- 机器学习-->sklearn数据预处理
- 【Keras】Keras学习框架
- 数据挖掘学习笔记2:数据预处理
- Keras学习
- 机器学习、自然语言处理、大数据和keras学习资料
- SVM学习之一:libsvm中的数据预处理
- sklearn学习记录二:数据预处理
- 机器学习——数据预处理
- leetcode-88
- servlet之filter知识
- 浅谈Python中对象拷贝
- 精通Jquery,JavaScript详解及回顾(3)
- 字符串逆序操作
- Keras学习---数据预处理篇
- python库集合
- Missing value auth-url required for auth plugin password
- java总结.
- Install And Configure Openstack Mitaka RDO On CentOS 7 [For POC]
- CodeForces 831C Jury Marks
- Java @override作用
- 如何删除文本编辑器内容中的图片
- DOS窗口双引号的作用