NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
来源:互联网 发布:跟团游推荐 知乎 编辑:程序博客网 时间:2024/06/03 22:57
论文主要创建了The RGB+D Action Dataset ,并提出了Part_Aware LSTM Network。主要内容如下:
一:Dataset
数据库包含3D skeletons (body joints),Masked depth maps,Full depth maps,RGB videos,IR videos. 包含3种不同角度的视图(-45,0,45)
二:Part_Aware LSTM Network
这的论文中比较新颖的地方,不同于把整个身体的long-term memory 保存在 cell 里,文章采用 part-based。独立去存储每一个part 的memory,然后连接在一起组成一个大的cell。
其实这种思想类似于很多文章,下面列举两例:
(1):Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition
(2):A Hierarchical Deep Temporal Model for Group Activity Recognition
首先总结一下上图,上图对 t 时刻的身体信息进行分块,相当于关注于细节,对不同part 的信息进行学习,汇总为一个大的cell。这种方法与直接接入普通的LSTM相比,网络结构稍微复杂了一些,更关注细节了,有一点分层的感觉,只不过把分层的思想放在了LSTM中。博主列举的两篇文章都是采用的分层的思想,一个基于body-part 采用多层BRNN去做动作识别,一个是针对video,对不同人进行part,最终做group activity,具体论文可参考博主以前博客
三:Experimental Setup
(1):How important is the skeleton normalization step, described in experimantal setup section?
In the extension of our experiments, we found out the normalization is not vital. You can skip the normalization step and it should work fine. Actually the network is supposed to learn how to normalize the data by itself.
(2):How did you choose the main actor in the preprocessing step?
We used a heuristic. It's very simple (but not necessarily correct for all the samples). Consider the variance of the X, Y, and Z values of all the joints and add them up. We took the body with the higher value as the main subject.
(3):How did you handle the variable subject numbers (one or two) in the input of the network?
Our inputs initially includes two sets of joints (for two skeletons). When we observed just one, the second set was filled with zeros. When we observed two or more, we decided about which one to be the main subject and which one to be the second one, by measuring the amount of motion of their joints. Also, some of the detected skeletons are noise, like tables and seats!!! You can eliminate them by filtering out the skeletons which does not have reasonable Y spread over X spread values over all of their joints.
四:总结
仔细回想,作者这种改造LSTM的思想还是很赞的,相比分层思想减少了计算量,而且简化了网络。以上只是个人感想,错误之处还请之处,非常感谢!
- NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
- RGB-D dataset
- 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body
- Multiview RGB-D Dataset for Object Instance Detection
- 车型识别“A Large-Scale Car Dataset for Fine-Grained Categorization and Verification”
- A Large-Scale Car Dataset for Fine-Grained Categorization and Verification论文笔记
- caffe版本-车型检测-A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
- A Comparision of Approaches to Large-Scale Data Analysis(译)
- 车辆2D/3D--Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis
- 论文阅读:Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis
- 3D Convolutional Neural Networks for Human Action Recognition
- 3D Convolutional Neural Networks for Human Action Recognition
- Face Alignment Across Large Poses -- A 3D Solution.
- 关于行为识别的综述Human Activity Analysis : A Review
- Pregel: A System for Large-Scale Graph Processing【转】
- Pregel: A System for Large-Scale Graph Processing(译)
- Pregel: A System for Large-Scale Graph Processing
- Pregel: A System for Large-Scale Graph Processing(译)
- 代码重构梳理-实践篇(上)
- html文本输入框,密码输入框
- Rails使用plain渲染出错的原因
- 阿里云ubuntu系统图形化访问
- 由一道题引出的C++位域问题
- NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
- VC++的工程文件说明
- leetcode 136 python
- Hibernate基本配置及操作
- 理解js中的原型链,prototype与__proto__的关系
- 排序之冒泡排序
- 网页版九宫格拼图游戏
- DHCP协议
- 蓝桥杯 基础练习 Sine之舞 暴力