Tensorflow中的物体识别API的demo实现

来源：互联网发布：夏易网络王宇阳视频编辑：程序博客网时间：2024/05/22 03:34

环境：python3.5 需要安装代码中对应的python库和tensorflow库
一、简述
TensorFlow提供了一个物体识别的API开发包，可以较为准确的识别出图片或者视频中的不同种类的物体并完成跟踪。具体的算法原理没进行深究，以下通过两个demo程序简单描述下API的应用及实现，其中第一个为实体图片的识别demo；另一个为视频中的实体识别demo。具体的tensorflow软件开发包连接地址为：https://github.com/tensorflow/models/tree/477ed41e7e4e8a8443bc633846eb01e2182dc68a/object_detection
二、实体图片的识别Demo
这个Demo程序基本照抄了TensorFlow提供的示例程序，有部分只是做了简单粗暴的修改，目的是实际验证下其API的实体识别和匹配能力。
首先需要为API寻找图片训练样本，tensorflow提供的图片训练样本为coco图片库，其中frozen_inference_graph.pb文件为训练模型最终生成的文件，这个文件是本身提供好的，如果需要训练自己的样本，还应提前完成pb训练文件的生成工作。Demo程序中采用了在线下载压缩包和解压的方法，第一次下载好后，下载压缩包的程序便可以被引去。
然后就是获取测试样本，需要将测试样本通过python的PIL处理最终由numpy输出为元组的形式。其实图片的数据化处理还可以交由opencv库完成，比如cv2.imread(image_path)可以直接获得图片的元组数据。
接下来就是tensorflow物体识别中最为核心的部分，由tf.Graph()完成图片的物体识别。获取image的TensorFlow存在形式、图片的匹配程度，图片所属的物体类别、图片的匹配框以及图片需要检测的各个部分。这些参数会存放到一个专门的数组中，便于后期显示。
最后就是采用matplotlib库实现显示功能，除了plt.imshow(image_np)语句外，最后又添加了pylab.show()才可以真正将图片显示出来。
源代码如下：

import numpy as npimport osimport six.moves.urllib as urllibimport sysimport tarfileimport tensorflow as tfimport zipfileimport pylabfrom collections import defaultdictfrom io import StringIOfrom matplotlib import pyplot as pltfrom PIL import Imageimport cv2# This is needed to display the images.#%matplotlib inline# This is needed since the notebook is stored in the object_detection folder.sys.path.append("..")from utils import label_map_utilfrom utils import visualization_utils as vis_util# What model to download.MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'MODEL_FILE = MODEL_NAME + '.tar.gz'# Path to frozen detection graph. This is the actual model that is used for the object detection.PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'# List of the strings that is used to add correct label for each box.PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')NUM_CLASSES = 90tar_file = tarfile.open(MODEL_FILE)for file in tar_file.getmembers():  file_name = os.path.basename(file.name)  if 'frozen_inference_graph.pb' in file_name:    tar_file.extract(file, os.getcwd())detection_graph = tf.Graph()with detection_graph.as_default():  od_graph_def = tf.GraphDef()  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:    serialized_graph = fid.read()    od_graph_def.ParseFromString(serialized_graph)    tf.import_graph_def(od_graph_def, name='')label_map = label_map_util.load_labelmap(PATH_TO_LABELS)categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)category_index = label_map_util.create_category_index(categories)def load_image_into_numpy_array(image):  (im_width, im_height) = image.size  return np.array(image.getdata()).reshape(      (im_height, im_width, 3)).astype(np.uint8)# For the sake of simplicity we will use only 2 images:# image1.jpg# image2.jpg# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.PATH_TO_TEST_IMAGES_DIR = 'test_images'TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 3) ]# Size, in inches, of the output images.IMAGE_SIZE = (12, 8)with detection_graph.as_default():  with tf.Session(graph=detection_graph) as sess:    for image_path in TEST_IMAGE_PATHS:      image = Image.open(image_path)      # the array based representation of the image will be used later in order to prepare the      # result image with boxes and labels on it.      image_np = load_image_into_numpy_array(image)      #image_np = cv2.imread(image_path)      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]      image_np_expanded = np.expand_dims(image_np, axis=0)      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')      # Each box represents a part of the image where a particular object was detected.      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')      # Each score represent how level of confidence for each of the objects.      # Score is shown on the result image, together with the class label.      scores = detection_graph.get_tensor_by_name('detection_scores:0')      classes = detection_graph.get_tensor_by_name('detection_classes:0')      num_detections = detection_graph.get_tensor_by_name('num_detections:0')      # Actual detection.      (boxes, scores, classes, num_detections) = sess.run(          [boxes, scores, classes, num_detections],          feed_dict={image_tensor: image_np_expanded})      # Visualization of the results of a detection.      vis_util.visualize_boxes_and_labels_on_image_array(          image_np,          np.squeeze(boxes),          np.squeeze(classes).astype(np.int32),          np.squeeze(scores),          category_index,          use_normalized_coordinates=True,          line_thickness=8)      plt.figure(figsize=IMAGE_SIZE)      plt.imshow(image_np)      pylab.show()

显示的效果如下：
这里写图片描述

三、视频中的实体识别demo
视频识别是图片识别的一个变种，视频就是截获每一个帧，并将其转换为图片，然后利用上述提到的tensorflow实体识别核心算法完成对每个图片的识别与跟踪。
这里用到了moviepy库完成对视频的编辑功能。主要代码如下：

clip = VideoFileClip("video1.mp4").subclip(0,2)       #moviepy acquire the information of videowhite_clip = clip.fl_image(process_image)             #NOTE: this function expects color images!!swhite_clip.write_videofile(white_output, audio=False) #the movie though tensorflow object_detectionHTML("""<video width="960" height="540" controls>  <source src="{0}"></video>""".format(white_output))finalclip = VideoFileClip("video_out.mp4")finalclip.write_gif("final.gif")

后续为了便于显示，又将mp4格式的文件转化成gif格式文件显示。Video1.mp4文件可以自行选择。
源代码如下：

# Import everything needed to edit/save/watch video clipsfrom moviepy.editor import VideoFileClipimport tensorflow as tffrom IPython.display import HTMLfrom PIL import Imageimport numpy as npimport osimport six.moves.urllib as urllibimport sysimport tarfileimport zipfileimport pylabfrom collections import defaultdictfrom io import StringIOsys.path.append("..")from utils import label_map_utilfrom utils import visualization_utils as vis_util# What model to download.MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'MODEL_FILE = MODEL_NAME + '.tar.gz'# Path to frozen detection graph. This is the actual model that is used for the object detection.PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'# List of the strings that is used to add correct label for each box.PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')NUM_CLASSES = 90tar_file = tarfile.open(MODEL_FILE)for file in tar_file.getmembers():  file_name = os.path.basename(file.name)  if 'frozen_inference_graph.pb' in file_name:    tar_file.extract(file, os.getcwd())detection_graph = tf.Graph()with detection_graph.as_default():  od_graph_def = tf.GraphDef()  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:    serialized_graph = fid.read()    od_graph_def.ParseFromString(serialized_graph)    tf.import_graph_def(od_graph_def, name='')label_map = label_map_util.load_labelmap(PATH_TO_LABELS)categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)category_index = label_map_util.create_category_index(categories)def detect_objects(image_np, sess, detection_graph):    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]    image_np_expanded = np.expand_dims(image_np, axis=0)    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')    # Each box represents a part of the image where a particular object was detected.    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')    # Each score represent how level of confidence for each of the objects.    # Score is shown on the result image, together with the class label.    scores = detection_graph.get_tensor_by_name('detection_scores:0')    classes = detection_graph.get_tensor_by_name('detection_classes:0')    num_detections = detection_graph.get_tensor_by_name('num_detections:0')    # Actual detection.    (boxes, scores, classes, num_detections) = sess.run(        [boxes, scores, classes, num_detections],        feed_dict={image_tensor: image_np_expanded})    # Visualization of the results of a detection.    vis_util.visualize_boxes_and_labels_on_image_array(        image_np,        np.squeeze(boxes),        np.squeeze(classes).astype(np.int32),        np.squeeze(scores),        category_index,        use_normalized_coordinates=True,        line_thickness=8)    return image_npdef process_image(image):    # NOTE: The output you return should be a color image (3 channel) for processing video below    # you should return the final output (image with lines are drawn on lanes)    with detection_graph.as_default():        with tf.Session(graph=detection_graph) as sess:            image_process = detect_objects(image, sess, detection_graph)            return image_processwhite_output = 'video_out.mp4'clip = VideoFileClip("video1.mp4").subclip(0,2)       #moviepy acquire the information of videowhite_clip = clip.fl_image(process_image)             #NOTE: this function expects color images!!swhite_clip.write_videofile(white_output, audio=False) #the movie though tensorflow object_detectionHTML("""<video width="960" height="540" controls>  <source src="{0}"></video>""".format(white_output))finalclip = VideoFileClip("video_out.mp4")finalclip.write_gif("final.gif")

显示的效果如下：
这里写图片描述

阅读全文

0 0