CVPR2017两篇基于骨架的动作识别

来源:互联网 发布:最火淘宝店铺 编辑:程序博客网 时间:2024/06/05 05:40

标签(空格分隔): 文章阅读


一、ST-NBNN:

没上神经网络
Each 3D action instance is represented by a collection of temporal stages composed by 3D poses, and each pose in stages is presented by a collection of spatial joints.

1. Introduction

learning-based classifiers,基于学习的骨架分类方法[5, 24, 35, 21, 14, 13] 已经取得了很多进展
non-parametric classifiers 不需要学习,训练参数 的分类方法还没有被很好的探索

  • 动机有二
    1:
    2:骨架信息并不像图片,有成千上万个像素,只有上十个joints,因此不需要端到端的复杂模型,非参数模型也能搞定
    这里写图片描述
  • 方法
    1:通过使用stage-to-class distance and bilinear classifier,该模型结合了参数模型和非参数模型的长处
    2:通过关键帧和关键的骨架节点(key temporal stages and spatial joints),模型能提取出必要的时空模型spatio-temporal patterns

  • 结果
    仅用线性分类器就能超过很多端到端的模型

各种基于骨架的spatial/temporal/spacial-temporal 模型

  • spatial
    把骨架分成几部分,用神经网络找这几部分之间的关系
    jopints,算joints之间的angle之类的
  • temporal
    好几种,不怎么理解
  • spacial-temporal:
    LSTM分析时空域骨架信息

NBNN

 3.Proposed Method

  • (1)stage-descriptor
    We first introduce a set of stage-descriptors to represent a 3D
    sequence (Sec. 3.1).
    描述子内容:pose 和velocity
    1video=N*stage=N*(l frames)

  • (2)NBNN来对descriptor分类
    Then NBNN [2] is used as a basic framework to classify actions (Sec. 3.2).

  • (3)学习时域和空域上的权重
    Finally, the learning of spatial and temporal weights is introduced to discover
    key poses and spatial joints for action recognition (Sec. 3.3)

二、基于李群的网络用来做动作识别

Deep Learning on Lie Groups for Skeleton-based Action Recognition
Lie group representations for action recognition
文章视频

提出了一个新的神经网络LieNet,学习基于李群的3D骨架特征来进行动作识别

  • 1.将李群结构和神经网络结合起来,相比于传统的神经网络结构,网络结构为了适应李群做了相应的调整,添加了RotMap Layer、RotPooling Layer、. LogMap Layer
  • 2.在这个结构中,为了使用反向传播算法,随机梯度下降法也做了相应探索

视频中列举的两个demo
realtime huaman pose recognition in parts from a single depth image
作者的LieNet 在NTU-RGBD dataset ( A Shahroudy et al CVPR2016)上展示

三、骨架相关的数据集

MSR-Action3D

Finally, the MSR Action3D [45] represents one of the most used datasets for HAR. It includes 20 activities performed by 10 subjects, 2 or 3 times. In total, 567 sequences of depth (320 × 240) and skeleton frames are provided, but 10 of them have to be discarded because the skeletons are either missing or affected by too many errors. The following activities are included in the dataset: high arm wave, horizontal arm wave,hammer, hand catch, forward punch, high throw, draw X, draw tick, draw circle, hand clap, two-hand wave, side boxing, bend, forward kick, side kick, jogging, tennis swing, tennis serve, golf swing, and pickup and throw. The dataset has been collected using a structured-light depth camera at 15 fps; RGB data are not available.

UTKinect

http://cvrc.ece.utexas.edu/KinectDatasets/HOJ3D.html

Berkeley MHAD

HMDB05

NTU RGBD CVPR2016

  • RGB videos
  • depth map sequences
  • 3D skeletal data
  • infrared videos

将3D骨架信息映射到视频中去

原创粉丝点击