Video Captioning with Multi-Faceted Attention
来源:互联网 发布:淘宝卖家群干什么用 编辑:程序博客网 时间:2024/06/05 20:56
Video Captioning with Multi-Faceted Attention
(Submitted on 1 Dec 2016)
Recently, video captioning has been attracting an increasing amount of interest, due to its potential for improving accessibility and information retrieval. While existing methods rely on different kinds of visual features and model structures, they do not fully exploit relevant semantic information. We present an extensible approach to jointly leverage several sorts of visual features and semantic attributes. Our novel architecture builds on LSTMs for sentence generation, with several attention layers and two multimodal layers. The attention mechanism learns to automatically select the most salient visual features or semantic attributes, and the multimodal layer yields overall representations for the input and outputs of the sentence generation component. Experimental results on the challenging MSVD and MSR-VTT datasets show that our framework outperforms the state-of-the-art approaches, while ground truth based semantic attributes are able to further elevate the output quality to a near-human level.
Submission history
From: Xiang Long [view email][v1] Thu, 1 Dec 2016 13:11:29 GMT (733kb,D)
阅读全文
0 0
- Video Captioning with Multi-Faceted Attention
- 论文笔记:Image Captioning with Semantic Attention
- Spatio-Temporal Attention Models for Grounded Video Captioning
- Video Captioning with Transferred Semantic Attributes
- Beyond Caption To Narrative: Video Captioning With Multiple Sentences
- Video captioning with recurrent networks based on frame- and video-level features and visual content
- 【论文笔记】Neural Relation Extraction with Multi-lingual Attention
- Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
- Weakly Supervised Dense Video Captioning
- Multimodal Memory Modelling for Video Captioning
- Deep Learning for Video Classification and Captioning
- Captioning Images with Diverse Objects
- Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention M
- Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention M
- Video Analysis 相关领域解读之Video Captioning
- 基于attention的video描述
- Hierarchical Boundary-Aware Neural Encoder for Video Captioning
- Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
- Win7系统执行bat批处理文件显示乱码该如何解决?
- 算法作业_30(2017.6.13第十七周)
- 【通俗解释】余弦相似度
- Github中fork的使用
- 《Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification阅读笔记
- Video Captioning with Multi-Faceted Attention
- 反射-类的加载概述和加载时机
- app运用api.js选项切换
- ERROR Unknown character set: 'utf8mb4'
- Gradle 修改Maven仓库地址
- Hexo个人免费博客(五) 使用自己的域名
- 自制操作系统- 二
- Linux 4.7内核针对syncookie性能所做的优化
- Hierarchical Boundary-Aware Neural Encoder for Video Captioning