TensorFlow 文本识别
来源:互联网 发布:软件招商加盟政策 编辑:程序博客网 时间:2024/06/03 08:16
import pandas as pdimport numpy as npimport tensorflow as tffrom collections import Counterfrom sklearn.datasets import fetch_20newsgroups
How TensorFlow works
import tensorflow as tfmy_graph = tf.Graph()with tf.Session(graph=my_graph) as sess: x = tf.constant([1,3,6]) y = tf.constant([1,1,1]) op = tf.add(x,y) result = sess.run(fetches=op) print(result)
[2 4 7]
How to manipulate data and pass it to the Neural Network inputs
vocab = Counter()text = "Hi from Brazil"for word in text.split(' '): word_lowercase = word.lower() vocab[word_lowercase]+=1 def get_word_2_index(vocab): word2index = {} for i,word in enumerate(vocab): word2index[word] = i return word2index
word2index = get_word_2_index(vocab)total_words = len(vocab)matrix = np.zeros((total_words),dtype=float)for word in text.split(): matrix[word2index[word.lower()]] += 1 print("Hi from Brazil:", matrix)
Hi from Brazil: [ 1. 1. 1.]
matrix = np.zeros((total_words),dtype=float)text = "Hi"for word in text.split(): matrix[word2index[word.lower()]] += 1 print("Hi:", matrix)
Hi: [ 1. 0. 0.]
Building the neural network
categories = ["comp.graphics","sci.space","rec.sport.baseball"]newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)
print('total texts in train:',len(newsgroups_train.data))print('total texts in test:',len(newsgroups_test.data))
total texts in train: 1774total texts in test: 1180
print('text',newsgroups_train.data[0])print('category:',newsgroups_train.target[0])
text From: jk87377@lehtori.cc.tut.fi (Kouhia Juhana)Subject: Re: More gray levels out of the screenOrganization: Tampere University of TechnologyLines: 21Distribution: inetNNTP-Posting-Host: cc.tut.fiIn article <1993Apr6.011605.909@cis.uab.edu> sloan@cis.uab.edu(Kenneth Sloan) writes:>>Why didn't you create 8 grey-level images, and display them for>1,2,4,8,16,32,64,128... time slices?By '8 grey level images' you mean 8 items of 1bit images?It does work(!), but it doesn't work if you have more than 1bitin your screen and if the screen intensity is non-linear.With 2 bit per pixel; there could be 1*c_1 + 4*c_2 timing,this gives 16 levels, but they are linear if screen intensity islinear.With 1*c_1 + 2*c_2 it works, but we have to find the bestcompinations -- there's 10 levels, but 16 choises; best 10 must bechosen. Different compinations for the same level, varies a bit, butthe levels keeps their order.Readers should verify what I wrote... :-)Juhana Kouhiacategory: 0
vocab = Counter()for text in newsgroups_train.data: for word in text.split(' '): vocab[word.lower()]+=1 for text in newsgroups_test.data: for word in text.split(' '): vocab[word.lower()]+=1
print("Total words:",len(vocab))
Total words: 119930
total_words = len(vocab)def get_word_2_index(vocab): word2index = {} for i,word in enumerate(vocab): word2index[word.lower()] = i return word2indexword2index = get_word_2_index(vocab)print("Index of the word 'the':",word2index['the'])
Index of the word 'the': 10
def get_batch(df,i,batch_size): batches = [] results = [] texts = df.data[i*batch_size:i*batch_size+batch_size] categories = df.target[i*batch_size:i*batch_size+batch_size] for text in texts: layer = np.zeros(total_words,dtype=float) for word in text.split(' '): layer[word2index[word.lower()]] += 1 batches.append(layer) for category in categories: y = np.zeros((3),dtype=float) if category == 0: y[0] = 1. elif category == 1: y[1] = 1. else: y[2] = 1. results.append(y) return np.array(batches),np.array(results)
print("Each batch has 100 texts and each matrix has 119930 elements (words):",get_batch(newsgroups_train,1,100)[0].shape)
Each batch has 100 texts and each matrix has 119930 elements (words): (100, 119930)
print("Each batch has 100 labels and each matrix has 3 elements (3 categories):",get_batch(newsgroups_train,1,100)[1].shape)
Each batch has 100 labels and each matrix has 3 elements (3 categories): (100, 3)
# Parameterslearning_rate = 0.01training_epochs = 10batch_size = 150display_step = 1# Network Parametersn_hidden_1 = 100 # 1st layer number of featuresn_hidden_2 = 100 # 2nd layer number of featuresn_input = total_words # Words in vocabn_classes = 3 # Categories: graphics, sci.space and baseballinput_tensor = tf.placeholder(tf.float32,[None, n_input],name="input")output_tensor = tf.placeholder(tf.float32,[None, n_classes],name="output")
def multilayer_perceptron(input_tensor, weights, biases): layer_1_multiplication = tf.matmul(input_tensor, weights['h1']) layer_1_addition = tf.add(layer_1_multiplication, biases['b1']) layer_1 = tf.nn.relu(layer_1_addition) # Hidden layer with RELU activation layer_2_multiplication = tf.matmul(layer_1, weights['h2']) layer_2_addition = tf.add(layer_2_multiplication, biases['b2']) layer_2 = tf.nn.relu(layer_2_addition) # Output layer out_layer_multiplication = tf.matmul(layer_2, weights['out']) out_layer_addition = out_layer_multiplication + biases['out'] return out_layer_addition
# Store layers weight & biasweights = { 'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])), 'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])), 'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))}biases = { 'b1': tf.Variable(tf.random_normal([n_hidden_1])), 'b2': tf.Variable(tf.random_normal([n_hidden_2])), 'out': tf.Variable(tf.random_normal([n_classes]))}# Construct modelprediction = multilayer_perceptron(input_tensor, weights, biases)# Define loss and optimizerloss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=output_tensor))optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)# Initializing the variablesinit = tf.global_variables_initializer()
# Launch the graphwith tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(len(newsgroups_train.data)/batch_size) # Loop over all batches for i in range(total_batch): batch_x,batch_y = get_batch(newsgroups_train,i,batch_size) # Run optimization op (backprop) and cost op (to get loss value) c,_ = sess.run([loss,optimizer], feed_dict={input_tensor: batch_x,output_tensor:batch_y}) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if epoch % display_step == 0: print("Epoch:", '%04d' % (epoch+1), "loss=", \ "{:.9f}".format(avg_cost)) print("Optimization Finished!") # Test model correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax(output_tensor, 1)) # Calculate accuracy accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) total_test_data = len(newsgroups_test.target) batch_x_test,batch_y_test = get_batch(newsgroups_test,0,total_test_data) print("Accuracy:", accuracy.eval({input_tensor: batch_x_test, output_tensor: batch_y_test}))
Epoch: 0001 loss= 1133.908114347Epoch: 0002 loss= 329.093700409Epoch: 0003 loss= 111.876660109Epoch: 0004 loss= 72.552971845Epoch: 0005 loss= 16.673050320Epoch: 0006 loss= 16.481995190Epoch: 0007 loss= 4.848220565Epoch: 0008 loss= 0.759822878Epoch: 0009 loss= 0.000000000Epoch: 0010 loss= 0.079848485Optimization Finished!Accuracy: 0.75
阅读全文
0 0
- TensorFlow 文本识别
- tensorflow识别手写数字
- Tensorflow学习:MNIST 识别
- Tensorflow MNIST 手写识别
- TensorFlow脑洞人脸识别(一)
- TensorFlow脑洞人脸识别(二)
- TensorFlow脑洞人脸识别(三)
- TensorFlow 中文语音识别
- TensorFlow手写识别
- TensorFlow-2 数字识别
- 使用TensorFlow识别交通标志
- TensorFlow手写汉字识别
- tensorflow手写体识别实例
- Tensorflow手写数字识别
- Tensorflow 人脸识别
- Tensorflow手写体识别mnist
- tensorflow实现车牌识别
- MNIST手写体识别--tensorflow
- Zookeeper常用命令
- JS判断客户端是否是iOS或者Android手机移动端
- 111. Minimum Depth of Binary Tree
- Spring(前言)
- 使用kotlin开发springboot应用
- TensorFlow 文本识别
- 三进制小数 oj132
- CA介绍
- Leetcode--17:Letter Combinations of a Phone Number
- C primer plus(编程练习)file-4.8-5
- 官方历程 调试STM32F407 VCP例程未知设备问题
- android 自定义 View
- HD 2063 增广路径求二分图最大匹配(匈牙利算法)
- C primer plus(编程练习)file-4.8-6